Site icon Towards AI

An AI Practitioner’s Guide to the Kdrama Start-Up

An AI Practitioner’s Guide to the Kdrama Start-Up

Author(s): JD Dantes

“We got 99.9% accuracy!!!” and other things to point out.

From Episode 5. This and succeeding images are screenshots of the author, unless stated otherwise.

Well, more of a commentary, really.

As a computer engineer with an interest in startups and experience building real-time computer vision and other AI systems, I’ve had my fair share of discussions with some colleagues and friends. Thought I’d share some of these with you.

Note: Possible spoilers ahead, you may want to go back to this post after you’ve watched the series.

First, the non-AI stuff.

A new metric for success.
Episode 4. I like you because…
I have big hands so… Suzy likes me?! Source
Plot armor? Nah, people just happen to be there at the same time.

Those out of the way, let’s go through the AI part, roughly in order of appearance in the series. Upon review, I acknowledge that the details could be pretty lengthy, so feel free to just skip to the parts which are of interest to you. Here we go!

#1. Tarzan and Jane

Episode 5. The Tarzan and Jane metaphor.

I like how the show attempts to explain artificial intelligence and machine learning to people who are not working in that field:

“So Tarzan keeps on trying. He catches a snake for her, but she doesn’t like it. One day, he gives her a cute cute rabbit, and that makes her happy.”

This trial-and-error approach represents how we feed different images into the “artificial neural network (ANN)” several times until it learns to predict that an image of a dog contains a dog, an image of a cat has a cat, and so on. Another term for this learning phase is training. You train the neural network with different images until it gets the desired accuracy.

A little bit more details for the interested:

You can try a color picker on Google. See how the RGB values range from 0 to 255?

#2. It’s actually pretty well-researched.

There’s this classic scene where two people get hacked, and they decided to work faster by typing on the same keyboard:

Literal pair programming. Last I checked, it shouldn’t look like that. Source

When a show features code snippets or hacking scenes, it’s not uncommon for programmers to pause and inspect the shown code to see if they make sense. Even Vagabond, a relatively recent series which Bae Suzy also starred in, had this pretty questionable scene:

From Vagabond Episode 5. JavaScript, and some random website?
…to hack what looks like Windows XP?

In that scene, Suzy plugs in a USB drive to the computer, and out comes some code that looks like JavaScript, except that those </script> tags are more commonly used for building websites, and I don’t think that Windows XP was written in JavaScript either. Well, I could be wrong.

Objective Function

Going back to Start-Up, early on we are exposed to a little more AI-speak. In Episode 3, Dosan introduces us to these terms:

Episode 3. Objective what?
Episode 3. Note these in the background: (1) Generator, (2) Discriminator, (3) the expression on the upper right with the log and triangle symbols

Well, I applaud, because they’re actually legitimate AI references! Whether they’re used like that in casual conversation is another matter, but let’s put that thought aside.

I’ll wind back a bit for the curious, but if you’re not too much into the details, feel free to skip to the next part.

Let’s go a few years (or even decades) back in time, when “machine learning” and “neural networks” weren’t as trendy, hardware was nowhere near as good, and people generally referred to the field as data science, data analytics, or simply statistics.

Here’s a classic task — let’s say we have the data of house prices in the city, and we want to predict what the house prices will be like some years from now, in 2025. For the sake of illustration, let’s say our data looks like this in some currency:

House prices in 2010, 2015, and 2020. What will it be in 2025?

How do we predict the prices in 2025? Well, for this pretty ideal data, we can draw a line, just like this:

We model the data using a line — we can use it to predict prices at any time in the future!

So if we wanted to predict the prices in 2025, we can just check the line to see the corresponding prices, which in this case is 400 (in some currency)!

But now, how does a computer “draw” a line anyway? One way is through trial and error. Another is to do some math to derive a formula. Or a mix of both. This means that we could end up with a less accurate model, like this one:

Another possible model we might end up with, and a less optimal one at that.

Here, our model isn’t as good, and in 2025 it predicts prices to be 300, which is an underestimate! Humans may occasionally “eyeball” the data and say that it “looks” or “fits” right. But for a computer, a consistent, quantitative measure is needed. Got some thoughts on how to do this? I’ll give you some time to think about it.

We can just measure the distance! For this, we can vertically subtract the actual prices (red) from the predicted prices (blue):

Get the vertical distances just by subtracting them!

Note that for 2010, I placed vertical bars to indicate that I’m taking the absolute value, or just the magnitude of the result so we don’t have to deal with negative numbers. So the total distance here is 100. What about the earlier graph where the line perfectly fit? The predicted prices exactly match the actual prices, so if we subtract them together…

For the earlier graph, there is zero distance between the actual and predicted prices!

…we find that the total distance of this “best fit line” is 0! Earlier, we just knew by “eyeballing” that the first line fit the data better, but now we also came up with a quantitative measure. Other terms for this distance are “cost” and “loss”. So our first model has a loss of zero, which is less than and better than the other model which has a loss of 100.

In Start-Up, there was this scene:

Episode 5. The twins’ graph had this nice downward trend.

The MIT twins had this graph which goes down nicely. See how the y-axis says “loss”? So over time (the x-axis says “epoch”, which is related to the number of images processed during training), their model was able to minimize the loss, meaning it was performing quite well. We say that their model converged.

Meanwhile, for the Samsan team, theirs looked more chaotic, going up and down.

On the other hand, for the Samsan team, things didn’t go as smoothly, and their loss graph kept on bouncing up and down. Their model did not converge. What could cause such non-convergence? I’ll discuss a bit more in the next part.

Minor anecdote: in the episode, their graph gets plotted out after a few seconds. In practice, training could be significantly longer, taking minutes, hours, or even days. This is also influenced by the number of images, the complexity of the neural network, and the hardware limitations. If your model does not converge, you’d tweak a few things, then try and restart to see if it converges after the changes. This is part of the reason why machine learning, especially neural networks with many or deep layers (“deep learning”) have advanced significantly only in recent years, despite origins being traced as early as the 1990s. The performance and costs of GPUs or custom AI hardware have been (and continue to be) a bottleneck in AI research.

Minimizing the Loss

In our example with the house prices, I manually had to take the absolute value of the 2010 distance so that we would not be dealing with negative values. From the perspective of the computer though, there are more convenient ways than individually looking at the negative values and flipping the sign. One such way is to take the square of the value (i.e., multiply it to itself)! So, instead of |100–150|, we just do (100–150) x (100–150) and the value will come out positive. To complete the example, the whole thing will now look like:

Instead of manually flipping negative signs, we take the square of the numbers for convenience.

So now the value of the total loss is different, but we still have a quantitative measure nonetheless. If we do it for the first line which passed cleanly through all the points, the loss would still be zero and less than 5000, so we can still conclude that it’s the better model.

Aside from computational convenience, using squares for the loss function actually gives us a nice visualization for what the neural network tries to do. What do quadratic equations like the following look like?

A quadratic equation. How do they look like?

If you recall your algebra or physics concepts, these are parabolas! You can check and play around with them on sites like WolframAlpha. Or shoot a basketball — that will form a parabolic arc governed by quadratic equations.

Quadratic equations form parabolas. You can use tools like WolframAlpha to visualize them.

Now, let’s assume that the loss as computed from some input x is some quadratic expression, visualized by a generic parabola like this:

Some parabolic loss graph. We want to reach the bottom.

So as before, initially the weights are all random (since we’re just guessing), the loss is high and we’re somewhere along the loss curve. Again, we want to minimize the value of the loss, which will be towards the bottom of the curve. How do we do this?

Here’s one way to think about it: imagine that you were in the mountains. You want to get to the valley or base of the mountain, but it’s dark and you can’t see that far in the distance. What can you do?

Thought of a solution? Here’s another hint: imagine that you’re a marble, placed randomly on a bowl. How would you get to the bottom?

Well, you’ll just roll against the incline! Pretty simple, right?

Looking at the parabolic loss curve, we can draw a line tangent to where we are to emphasize the incline. In the perspective of the computer, we can do some math to quantify the value of the slope or the direction of the incline. If the value of the slope is positive, then the incline is increasing to the right, so we want to move to the left to go to the bottom. If the slope is negative, then the incline is already going downwards, so we can continue going right.

Slope is positive (pointing to the upper right) so we go left. If we were on the opposite end, slope would be negative and point downwards and to the right, so we would go to the right in that case.

And, that’s it! We just keep on doing this until we reach the bottom, where the loss is minimal. One question remains though — how much should we step? Well, we can set this to be some arbitrary value. However, if we step too little, then we may take too long to get to the bottom — remember when I mentioned that the training process could take hours or days? This is one factor.

“A journey of a thousand miles begins with a simple step.” But if the journey takes too long, try larger steps.

Another concern is that if the step size is too little, we may end up stuck in local grooves or minima. Going back to the bowl and marble, imagine that the bowl is not smooth, but could have small grooves or pockets where the marble could get stuck. To pass over such small holes, the marble would have to “jump over” or “take larger steps” to get to the actual bottom of the bowl. In other words, towards the global minimum and not the local minima.

If we step too little, we could get stuck in local minima.

So surely, we can just make the step size as large as we can right? That will make us reach the bottom faster too. Well, we want a decently large step size, but we should still be careful that we don’t set it too large. Here’s what could happen. Let’s say that we’re near the global minimum already:

We’re almost at the bottom! Just need to step a bit more.

We know the direction of the incline, so we step towards the left, but if it’s too big, we’d actually miss the global minimum:

We missed the bottom by stepping too much!

So now we’re on the other side. Worse, the incline here is steeper. Generally it also makes sense to scale the stepping size with how steep the incline is, because that probably means that we’re still far from the bottom. However, in this case, this is what happens:

We just bounced back.

We leap back to the right side, farther from the bottom than we originally were. And this could continue, on and on…

Bigger steps could mean faster convergence, but it could also lead to non-convergence!

And here we clearly see that rather than converging closer to the bottom, we keep on overshooting and oscillating indefinitely. This is what could have happened to Samsan Tech when their model failed to converge. Aside from the step size, there are other techniques that could be tried (e.g., taking into account the “momentum” of how well we were doing so far), along with other things like cleaning or preprocessing the data, or even just getting better and more images. In practice, it’s really a mix of empirical trial-and-error, theory, and rules of thumb from the state-of-the-art literature.

Terminology Trivia

And we’re done! Whew! That was admittedly quite a lot to cover. I applaud you if you reached this point, as this is usually covered in graduate or undergraduate semesters about computer vision and artificial intelligence.

If you recall, in one of the scenes I mentioned to note the terms “generator” and “discriminator”. We’ve gone through neural network training and objective functions but have not mentioned generators. This is because they’re a special kind of neural network called Generative Adversarial Networks (GANs). In fact, they’re a good example of what other loss functions are out there, and where things are maximized instead! Actually, there are two neural networks here! One tries to maximize, while the other tries to minimize things.

If you’re intrigued and ready for more, watch out for Part 2 (published next weekend, you can get notified here) where we’ll discuss GANs and the excitement about them, as well as other things about Start-up that they got right, beyond just the technicals!

Acknowledgments

Thanks to Lea for her suggestions and reviewing early drafts of this post.

Want more? Join the email list for more stories like this, from tech and education to productivity and self-improvement.

Connect on Twitter, LinkedIn for more frequent, shorter updates, insights, and resources.


An AI Practitioner’s Guide to the Kdrama Start-Up was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Exit mobile version