ISSUE #20 - AUGUST 22, 2023

Takeaways after training 150+ AI models in 8 days

The story and the key learnings (not limited to product only) after training 155 AI models during my summer holidays.

During my summer holidays, I trained over 150 AI models. Why? Firstly, I find working on something new and creative very fascinating, but most importantly, as a product manager, I have never worked on an AI-focused product. So, I wanted to gain first-hand experience working on such a project and possibly gain some learnings that could benefit me professionally.

With that in mind, I looked into several public datasets and decided to work on one where I would classify patients as diabetic or not based on eight different parameters from their medical history. The best result I achieved was a 96.3% accuracy on my model predictions (using a separate validation dataset). Although accuracy is not necessarily always the most important evaluation metric for an AI model, in this particular case, I only relied on this to evaluate my models for simplicity.

In this post, I will go over the story and my learnings. At the end, you can find a link to my GitHub repository with the code I used to develop and train my models. You can read the full story first and then proceed with the learnings or skip directly to the learnings section. It's up to you.

The story

Preliminary Decisions

As soon as I decided what dataset I was going to work on, I had to make high-level decisions about my approach, as this would dictate some of the work I would have to do later on. In this case, I was dealing with a simple binary classification problem, where I would have to classify patients into two classes, diabetic or not. So, the first high-level decision I made was to tackle this problem using a neural network, which is the designated (not the sole though) tool for classification problems. This was the first high-level decision of importance, as it dictated the libraries, tools, and frameworks I would be using later on, as well as the preparation work I would have to do on my dataset. For instance, I knew I would use tools such as Numpy to work with my data and Tensorflow to train my models.

Working with my dataset

Accordingly, I knew I had to prepare my data to be ingested in a neural network. First and foremost, that meant that in its final form, my dataset would need to have a specific format and I would need to split it into three segments:

The training dataset that would be used to train my models consisted of approximately 70% of my original dataset (100,000 total samples).
The test dataset that would be used during training to evaluate the accuracy of my models, and based on those, I would pick the best model (15% of the dataset).
The validation dataset that would be used on the selected best-performing model, to verify how the model is working in the wild (15% of the dataset).

Each of those datasets would be split into two vectors. The one with the training params (the X params) and the labels that indicate the final result (the y params).

However, this would only be the final form of my data. Before that, there was a long journey that would be split into two distinct phases: In the first one, I was exploring my dataset, and in the second, I had to get my hands dirty and prepare my data.

Exploring the dataset

If I would say just one thing about this phase is that I never imagined how important it would be. Before I started, I naively thought that the structure and content of my data would be straightforward and largely expected. I should be smarter than that after working ten years in software, knowing that many things could go wrong in a large dataset. You can see unexpected values, and you will certainly have missing values or even unexpected data types.

I accidentally realized that while looking at the labels of one of the data categories, the gender. How many distinct values could exist under this category? As you can easily imagine, more than I expected. Once I realized that, I went over every single category, and alongside many surprises, I realized that I should spend significant time preparing my data. I would also have to make some decisions on how I would use them.

For example, one of the categories was the smoking history. The values under this category were "No Info", "current", "ever", "former", "never", "not current". The problem was with the "No Info" value, which accounted for approximately 35,000 samples. This meant that for about 35% of my dataset, I practically had no info about their smoking history. In this case, I had to make a decision. What would I do with this category? I had some options:

Use the "No Info" label as a separate label and keep these samples in my dataset.
Get rid of the samples with the "No Info" as a label, which was an unattractive option since I would lose 35% of my dataset.
Get rid of the smoking history category as a whole, but keep 100% of the rest of the categories of my dataset.

At that point, I decided that I didn't have enough knowledge to pick one of the above options, so I decided to create three variations of my dataset and later on feed them to a simple model to see which of those options would work better in terms of accuracy and double down on this one. Sneak peek; the first one worked better, so this is the option that I took. In general, there were several similar small decisions that I had to make while preparing my data.

Preparing the dataset

Once I understood my data well, which included visualizing them, I started preparing them to be ingested in my neural network. That involved small and other more extensive tasks, such as:

Trimming unnecessary rows (e.g., headers).
Replacing values such as strings with numbers to be easier ingested in the model (e.g., replacing "male", "female", and "other" with 0, 1, 2 labels).
Converting them to data types that would enable me to work more efficiently.

In general, this phase didn't take long. As soon as I knew what were the jobs to be done, it was pretty easy for me to do that. However, I reckon that in a large-scale project, there should be certain pipelines and processes that should be taking care of the data preparation part, as it would be integral to the success of such a project.

Thinking about the model

On a high level, what you need to understand about neural networks and how they work is the following: Neural networks are consisted of layers, and their layers consist of nodes (the neurons). The data are initially ingested in the input layer, and once the neurons process them, several parameters are extracted and then ingested into the next layer to be processed by its neurons and so on. Then, in the case of a binary classification model like the one I was working on, the parameters are ingested in the output layer, which is consisted of two neurons (because there are only two potential classes), and we get a decision on if our patient is diabetic or not.

In terms of layers and neurons, one of the decisions I had to make was the general architecture of my model. In other words, how many layers would I use, and how many nodes (neurons) each layer would have? The more layers and nodes you have, the more complicated the architecture, which means the more time and resources you need to train your models and, in some cases, the higher the risk of overfitting your model (meaning to make it perform great on your training data, but poorly out in the wild).

My initial idea was to create three different model architectures, a simple one, one a bit more complicated, and a super-complicated one, and see which one would perform better. Then, based on the result, double down on this kind of architecture to further optimize it. I quickly realized that this idea wouldn't get me too far. To begin with, the results were at best inconclusive. There were only slight differences in the performance of each architecture, although the simpler ones gave a small indication that they could be performing better.

Still, I realized that trying to figure out an optimal architecture like that was like shooting in the dark, trying to hit the jackpot. So, I decided to take a more experimental approach.

The optimization function

This is when I decided to use an optimization function. This function would practically create and train several models based on every potential combination of layers and nodes per layer. It would allow me to pick the model with the best performance. For instance, if I wanted to try every potential combination of 1, 2, or 3-layer architectures, and each layer could contain anywhere between three to seven nodes, there would be 155 possible architectures that I would need to try. As you can easily understand, it would take forever to do this manually. However, it only took a little less than 40 minutes for the optimization function I created to compile and train those models for me and let me pick the best one.

In particular, in those almost 40 minutes, I used a MacBook Air to train 155 models. The best result I achieved was 96.3% accuracy, while the worst was 91.2%. In this case, the amount of data, models, and their complications were reasonable, so the time and resources needed to train the models were well within grasp. So, I didn't need to do any resource performance-oriented optimizations. However, this would be essential in larger-scale projects, or else the cost can easily get out of hand very quickly.

That was it. In terms of duration, the whole project took me about eight days from start to finish. Each day I worked on it for an average of 2.5 hours.

If you are interested, you can find and download the final form of the code I used here, run it yourself on your machine, and see the results.

The Learnings

In the following section, I share the main thoughts and insights derived from this experience. While thinking about them, I approached the issue by asking, "what do I keep from this experience if I wanted to apply my learnings on a similar - larger scale project (e.g., at work)? Well, let's see:

Data is king. As fancy as AI and the models sound, the data is the most valuable asset and the make-it-or-break-it part. If you don't have the data that you need in terms of substance or if they are not sufficient, no matter how sophisticated the models that you are using are, the project is doomed to fail.
Data collection, storage, and processing must become a core part of the product culture to succeed.
Data fluency is a must. Having people with experience in working with data at scale is essential. They could be data scientists or data engineers, however, I cannot imagine working on a large-scale project and this skillset to be missing from a product team. Several decisions need to be made with an impact on your models' performance and results, but also on the resources and cost.
As an AI product manager, a big part of your job will be to have a good understanding of the feasibility of a project in combination with its value. Several parameters can inform this decision. Broadly they are summarized in questions of technical and business nature as well. In particular, here are some of the questions that you might need to answer:
- Can the AI system that you design meet specific performance requirements? (Technical)
- How much data are necessary? (Technical)
- What's the engineering timeline? (Technical)
- Will that lower the costs? (Business)
- Could this help us increase revenues? (Business)
- Will this model help us launch a new product / business? (Business)
You also need to have a deep understanding of the problem you're solving to pick the proper evaluation and acceptance criteria for your models. Is it accuracy? Are you tolerant of false positives? Is it recall or something else? For instance, if you are developing a model that is used to predict the likelihood of a patient getting cancer, you might not want to take any chances of a mistake, so you might be more tolerant of false positives so that you will eradicate the possibility of not diagnosing a patient correctly.
The training and model optimization processes can be extremely resource heavy. So, you need to be prepared accordingly regarding the budget or find other smart ways to avoid some of the costs. Several pre-trained models out there can give you a head start and help you reduce your costs, or other tools such as Google Colab could help.
There's no silver bullet when it comes to model architecture. You need to have an experimental approach, try out several iterations, and this cannot be a manual process. This could take more time and cost more, however it will increase your chances of developing the best-performing model that you can. However, more often than not, the more you complicate things, the worst results that you will get.

In conclusion, it was a super-valuable experience for me, bringing me a step closer to better understanding the world of data and AI. Hopefully, some of the above learnings will be useful in practice for you as well, however, if you're like me and want to familiarize yourself a bit more by getting your hands dirty, I strongly recommend doing something similar.

Past Newsletters

Browse All

ISSUE #19 - JULY 29, 2023

Backward Working Documents: What are they, and why should you use them

In this post, you can learn what are Backward Working Documents, how and when to create them, and what are the benefits of using them.

Read here

ISSUE #18 - JUNE 20, 2023

Reflection Documents: Before crafting your next quarter's plan

Over the years, "reflection documents" have helped me and my teams improve how we set goals, create plans and collaborate as a team.

Read here

ISSUE #17 - MAY 25, 2023

Pre-PMF: Prioritizing the right items to work on

In this post, I am not talking about prioritization frameworks. I aim to present you with some of my learnings about pre-PMF prioritization, most of which I had to..

Read here

The Product Notebook by Manos Kyr.

Once every month, I’m sharing my thoughts on product, growth & entrepreneurship.

Menu

Home

Past Issues

Let's connect

hi [at] manoskyriakakis.com

X (formerly Twitter)

Latest Newsletters

ISSUE #25
What to do when you don't know what to do next

ISSUE #24
When good docs go bad: Learning from a PM's misstep