ISSUE #20 - AUGUST 22, 2023
The story and the key learnings (not limited to product only) after training 155 AI models during my summer holidays.
During my summer holidays, I trained over 150 AI models. Why? Firstly, I find working on something new and creative very fascinating, but most importantly, as a product manager, I have never worked on an AI-focused product. So, I wanted to gain first-hand experience working on such a project and possibly gain some learnings that could benefit me professionally.
With that in mind, I looked into several public datasets and decided to work on one where I would classify patients as diabetic or not based on eight different parameters from their medical history. The best result I achieved was a 96.3% accuracy on my model predictions (using a separate validation dataset). Although accuracy is not necessarily always the most important evaluation metric for an AI model, in this particular case, I only relied on this to evaluate my models for simplicity.
In this post, I will go over the story and my learnings. At the end, you can find a link to my GitHub repository with the code I used to develop and train my models. You can read the full story first and then proceed with the learnings or skip directly to the learnings section. It's up to you.
As soon as I decided what dataset I was going to work on, I had to make high-level decisions about my approach, as this would dictate some of the work I would have to do later on. In this case, I was dealing with a simple binary classification problem, where I would have to classify patients into two classes, diabetic or not. So, the first high-level decision I made was to tackle this problem using a neural network, which is the designated (not the sole though) tool for classification problems. This was the first high-level decision of importance, as it dictated the libraries, tools, and frameworks I would be using later on, as well as the preparation work I would have to do on my dataset. For instance, I knew I would use tools such as Numpy to work with my data and Tensorflow to train my models.
Accordingly, I knew I had to prepare my data to be ingested in a neural network. First and foremost, that meant that in its final form, my dataset would need to have a specific format and I would need to split it into three segments:
Each of those datasets would be split into two vectors. The one with the training params (the X params) and the labels that indicate the final result (the y params).
However, this would only be the final form of my data. Before that, there was a long journey that would be split into two distinct phases: In the first one, I was exploring my dataset, and in the second, I had to get my hands dirty and prepare my data.
If I would say just one thing about this phase is that I never imagined how important it would be. Before I started, I naively thought that the structure and content of my data would be straightforward and largely expected. I should be smarter than that after working ten years in software, knowing that many things could go wrong in a large dataset. You can see unexpected values, and you will certainly have missing values or even unexpected data types.
I accidentally realized that while looking at the labels of one of the data categories, the gender. How many distinct values could exist under this category? As you can easily imagine, more than I expected. Once I realized that, I went over every single category, and alongside many surprises, I realized that I should spend significant time preparing my data. I would also have to make some decisions on how I would use them.
For example, one of the categories was the smoking history. The values under this category were "No Info", "current", "ever", "former", "never", "not current". The problem was with the "No Info" value, which accounted for approximately 35,000 samples. This meant that for about 35% of my dataset, I practically had no info about their smoking history. In this case, I had to make a decision. What would I do with this category? I had some options:
At that point, I decided that I didn't have enough knowledge to pick one of the above options, so I decided to create three variations of my dataset and later on feed them to a simple model to see which of those options would work better in terms of accuracy and double down on this one. Sneak peek; the first one worked better, so this is the option that I took. In general, there were several similar small decisions that I had to make while preparing my data.
Once I understood my data well, which included visualizing them, I started preparing them to be ingested in my neural network. That involved small and other more extensive tasks, such as:
In general, this phase didn't take long. As soon as I knew what were the jobs to be done, it was pretty easy for me to do that. However, I reckon that in a large-scale project, there should be certain pipelines and processes that should be taking care of the data preparation part, as it would be integral to the success of such a project.
On a high level, what you need to understand about neural networks and how they work is the following: Neural networks are consisted of layers, and their layers consist of nodes (the neurons). The data are initially ingested in the input layer, and once the neurons process them, several parameters are extracted and then ingested into the next layer to be processed by its neurons and so on. Then, in the case of a binary classification model like the one I was working on, the parameters are ingested in the output layer, which is consisted of two neurons (because there are only two potential classes), and we get a decision on if our patient is diabetic or not.
In terms of layers and neurons, one of the decisions I had to make was the general architecture of my model. In other words, how many layers would I use, and how many nodes (neurons) each layer would have? The more layers and nodes you have, the more complicated the architecture, which means the more time and resources you need to train your models and, in some cases, the higher the risk of overfitting your model (meaning to make it perform great on your training data, but poorly out in the wild).
My initial idea was to create three different model architectures, a simple one, one a bit more complicated, and a super-complicated one, and see which one would perform better. Then, based on the result, double down on this kind of architecture to further optimize it. I quickly realized that this idea wouldn't get me too far. To begin with, the results were at best inconclusive. There were only slight differences in the performance of each architecture, although the simpler ones gave a small indication that they could be performing better.
Still, I realized that trying to figure out an optimal architecture like that was like shooting in the dark, trying to hit the jackpot. So, I decided to take a more experimental approach.
This is when I decided to use an optimization function. This function would practically create and train several models based on every potential combination of layers and nodes per layer. It would allow me to pick the model with the best performance. For instance, if I wanted to try every potential combination of 1, 2, or 3-layer architectures, and each layer could contain anywhere between three to seven nodes, there would be 155 possible architectures that I would need to try. As you can easily understand, it would take forever to do this manually. However, it only took a little less than 40 minutes for the optimization function I created to compile and train those models for me and let me pick the best one.
In particular, in those almost 40 minutes, I used a MacBook Air to train 155 models. The best result I achieved was 96.3% accuracy, while the worst was 91.2%. In this case, the amount of data, models, and their complications were reasonable, so the time and resources needed to train the models were well within grasp. So, I didn't need to do any resource performance-oriented optimizations. However, this would be essential in larger-scale projects, or else the cost can easily get out of hand very quickly.
That was it. In terms of duration, the whole project took me about eight days from start to finish. Each day I worked on it for an average of 2.5 hours.
If you are interested, you can find and download the final form of the code I used here, run it yourself on your machine, and see the results.
In the following section, I share the main thoughts and insights derived from this experience. While thinking about them, I approached the issue by asking, "what do I keep from this experience if I wanted to apply my learnings on a similar - larger scale project (e.g., at work)? Well, let's see:
In conclusion, it was a super-valuable experience for me, bringing me a step closer to better understanding the world of data and AI. Hopefully, some of the above learnings will be useful in practice for you as well, however, if you're like me and want to familiarize yourself a bit more by getting your hands dirty, I strongly recommend doing something similar.
ISSUE #19 - JULY 29, 2023
In this post, you can learn what are Backward Working Documents, how and when to create them, and what are the benefits of using them.
ISSUE #18 - JUNE 20, 2023
Over the years, "reflection documents" have helped me and my teams improve how we set goals, create plans and collaborate as a team.
ISSUE #17 - MAY 25, 2023
In this post, I am not talking about prioritization frameworks. I aim to present you with some of my learnings about pre-PMF prioritization, most of which I had to..
Once every month, I’m sharing my thoughts on product, growth & entrepreneurship.
Latest Newsletters
ISSUE #25
What to do when you don't know what to do next
ISSUE #24
When good docs go bad: Learning from a PM's misstep
Copyright © Manos Kyriakakis