Learning on the job versus learning in the classroom — what I’ve found trying both. Part 1

TLDR: Self-directed learning is hard!

https://www.freepik.com/vectors/business

“There is so much information on the internet now,” is a phrase I hear. “There are tutorials, there are online courses, and there is stack overflow, a great forum for answering questions to your problems. Why would someone need a formal degree anymore? All the knowledge you need is online and available”

This is a question that I wrestled with from the beginning of my career until the day I started a Master’s degree in Computer Science. Now that I am a couple of semesters into my course, I can say that I’ve experienced both having to self-learn concepts and practices, and having to review those same ideas through structured coursework. So I thought I would share my story. Rather than share my experiences through a bulleted pointed list, though, I thought it may be more useful to write out a few formative experiences that illustrate my thoughts on this subject.

There are three parts to my story — how I started with self-learning through internet resources and work projects, how I went over the same material in a formal setting, and how, since completing my module, I am now back to learning on the job to continuously improve myself.

Part 1: Self-learning through the internet and on the job.

Learning about databases

When I first started coding for work, our team needed to put together an SQL database to productionize our machine learning model. We had built a model in the way most people are familiar with — with csvs for our training set and csvs for our test set. We passed different versions of the files around to each other as email and chat attachments. It worked because our team was small and the data was static. However, this would not work in production when we had to think of how business users would interact with our product. For that, we needed a more robust way of storing incoming data to it could be passed to our model. We also needed a way of storing our model’s predictions so it could later be retrieved by a user facing application. A database was the way to go.

My team and I had never set up an SQL database before. There was no Databases CSXXX course that we had taken that had showed us the tools or the concepts we needed. Personally, all I had to go on was an introductory tutorial on SQLite that I had read from a blog, and some knowledge of SQL queries.

But we were willing to try, and we did have access to that wonderful invention — the internet — which had lots of great tutorials. We surfed around for a bit. We found the Flask Mega-Tutorial by Miguel Grinberg, which walks one through a step-by-step process of how to set up database models with SQL Alchemy, manage migrations with Alembic, and serve database queries with a Flask frontend. The tutorial was great (I still reference it now) and basically answered our need: a tutorial on how to store data and retrieve it when necessary via a usable API interface. Another helpful colleague showed us how draw out our ER diagrams with SQL Management Studio, and we basically built out our app from that foundation.

Further down the road, knowing that I wanted to learn more about databases, another helpful colleague (helpful colleagues are so valuable!) passed me the awesome Manga Guide to Databases by Mana Takahashi and Shoko Azuma. It was a humorous, light-hearted, and yet technical grounded introduction into why we need databases and what concepts we need to apply when thinking about databases. In the book, a princess needs to manage the inventory of her country’s fruit empire. They move from passing information around on bits of paper to setting up a real database that incorporates normalisation and security and speed. Paging through the book, laughing at the manga characters as I went, I learnt about the normal forms, normalisation, entity-relationship diagrams, concurrency and replication. The book was a great way to whet my appetite for more things database-related.

My interest piqued, I dived into a more serious book- Designing Data-Intensive Applications by Martin Klepmann. Klepmann is a great writer, and by clearly and simply taking the reader through the design principles and pros and cons of the tools used for processing and storing data, he aims to help people make good design decisions. The book had glowing reviews, and what it promised was exactly what I needed in my day-to-day.

It was also, however, 700 pages long and very detailed. Many of the concepts were new to me. I kind of got the idea behind idempotency but not quite. Time synchronisation between distributed systems was also another hard concept. It was clear that it would take a lot longer, and quite likely a lot more effort, than just a few readings before I internalised the concepts that were outlined.

At this point, I had just see-sawed from an introductory book into one that was a lot more advanced. My learning didn’t follow a clear progression from basic to advanced, the way, for example, there is a progression between a CS1000 course and a CS2000 course. I had leapt from an introductory text straight into an extensive work, a work that dug into all the contours of the body of knowledge surrounding data processing technologies, without guardrails or intermediate steps.

As a result, for about two years from then on, my grasp of relational databases was always a bit patchy. I knew enough to put together a database that would serve a simple CRUD application. But I was fuzzy on the details of how a database index worked, on the trade-offs between different types of indexes and how a well-chosen index can speed up a query but a poorly-configured one barely makes a difference to query speed. One could say that I couldn’t have expected much more — my main job was building machine learning models, not optimising databases. It was only natural that I did not have the time to explore the knowledge landscape as comprehensively as I would have were I to do a database engineer. I could decide to have a passable knowledge, and focus on my core strengths.

Yet increasingly I felt that my computer science knowledge was a limiting factor to my growth, both in terms of my technical skills and my career progression. All it took was one poorly-answered interview question to show that I didn’t really know the fundamentals of how data flows through an organisation. Perhaps more frustratingly, it wasn’t because I couldn’t understand what was going on, I just hadn’t taken the time to work out the concepts in my head.

And this brings me to the real dilemma that I wrestled with as I considered self-learning versus a structured course during those years. There are many, many tutorials and lessons on the web, many of them beautifully structured, yet how does one progress from those beginner tutorials to something more intermediate, without the risk of falling over a cliff into a rabbit hole of academic papers and cutting edge technologies?

My only answer thus far is that this largely depends on lucky opportunities at work. For example, with Elasticsearch, I was lucky enough to have had a few interesting projects that went beyond what the standard Elasticsearch deployment offers. I needed to implement an Elastic index for Japanese search queries, which required me to go beyond how the standard analyser works in Elastic and look at how the the Japanese one worked to take into account a character-based language. In another instance, I needed to use Elastic for an auto-complete function, which needed me to understand the different options for n-gram tokenisers, which went beyond the tokeniser that came out of the box. Throughout, my transitioning from a basic understanding of Elastic indexes to a less-basic one was helped by the really good documentation and easily searchable forums. I have also, however, had my fair share of experiences trying to make a library work when the library has minimal documentation. It’s not easy.

And here lies an aspect to self-learning that I don’t see discussed much — autonomous learning is hard. Real understanding takes more than a few Coursera courses or copying and pasting code solutions from Stack Overflow. What is available on the web is great for getting a head start, but how much improvement happens after this head start largely depends on how good a self-directed learner one is. And self-direction is a meta-cognitive skill that is complex.

A description of being a self-directed learner is available from the Recurse Centre, a centre offering programmers creative retreats to improve their craft.

“RC being self-directed means you’ll get the most out of it if you’re a smart, deliberate thinker and learner. RC is for you if you can pick things up quickly, be rigorous and introspective, and understand the limits of your knowledge” (source)

There are many skills listed in the description. I can’t say that I’ve mastered any one of them. Indeed, I see practicing these skills as a whole endeavour on their own.

So teaching yourself from internet books and videos and even mini-projects is possible, but it does come with a big caveat. The big assumption is that you are an autonomous and efficient self-directed learner, and (perhaps more crucially) with the necessary bandwidth to concentrate on a topic for a meaningful length of time. This is hard! Especially with the other things competing for your attention — other interesting topics, family, friends and so forth.

Over time, a more satisfying balance for me has been to view a subject both theoretically and practically, through both real-world projects and academic ones. In the next part of this series, I go through what this balance looks like for me currently. :)

I work with data in the little red dot