My experience interviewing for a full-time position at a large data science company in Australia

Photo by Denise Jans on Unsplash

About a year ago, I wrote an article documenting my initial steps into the world of data science and machine learning with hopes of one day breaking into the professional industry.

After months of hard work and perseverance, I am beyond excited to announce that that day has finally arrived — I have finally landed a full-time position at a large data science consultancy here in Melbourne, Australia!

For privacy reasons, I am not going to disclose the name of the company, at least until I officially start working and once everything settles in. I do, however, wanted to use…


Model Interpretability

4 must-know techniques to create more transparency and explainability in model predictions

Photo by Will Porada on Unsplash

There is no doubt that machine learning models have taken the world by storm in recent decades. Their ability to identify patterns and generate predictions that far exceed any other form of statistical technique is truly remarkable and hard to contend with.

However, despite all of its promising advantages, many still remain sceptical. Specifically, one of the main setbacks that machine learning models struggle with is the lack of transparency and interpretability.

In other words, although machine learning models are highly capable of generating predictions that are very robust and accurate, it often comes at the expense of complexity when…


Two of the most popular algorithms in the world of machine learning, who will win?

Photo by Geran de Klerk on Unsplash

If you have spent some time in the world of machine learning, you would have undoubtedly heard of a concept called the bias-variance tradeoff. It is one of the most important concepts any machine learning practitioner should learn and be aware of.

Essentially, the bias-variance tradeoff is a conundrum in machine learning which states that models with low bias will usually have high variance and vice versa.

Bias is the difference between the actual value and the expected value predicted by the model. A model with a high bias is said to be oversimplified as a result, underfitting the data.


Start here if you are new to the exciting world of natural language processing

Photo by Nathan DeFiesta on Unsplash

Words, sentences, paragraphs and essays. We use them almost every day of our adult lives. Whether you are sending out a tweet, composing an email to your colleague, or writing an article like what I am doing now, as humans, we all use words to communicate our thoughts and our ideas.

Now, imagine a world where we could teach computers to interact with words the same way that we would with another human being. …


Probability and statistics, machine learning algorithms, brainteasers, and more

Photo by Andrew Neel on Unsplash

This article was inspired by several other similar ones that I have used in preparation for my own data science interviews. I thought it would be a fun idea to compile my own comprehensive set of questions (with answers) and share it with the world!

The questions in this article are divided into 4 parts:

  • Probability and statistics — 20 questions
  • Machine learning fundamentals (non-algorithmic) — 31 questions
  • Machine learning algorithms — 11 questions
  • Brainteasers — 8 questions

These are generally basic to intermediate-level interview questions that should give you a good sense of what to expect in a technical…


10 tips that will help maximise your chances of getting hired as a fresh graduate

Photo: Christina/Unsplash

“Thank you for taking the time to apply. Unfortunately, we have decided to not progress with your application.”

“We have carefully reviewed your application against the program and assessment criteria and unfortunately, we will not be progressing your application to the next stage in the recruitment process.”

“We spoke to a number of well-qualified candidates and unfortunately, we have decided to move forward with other candidates whose backgrounds are a stronger match for this role.”

One by one, rejection emails started to pile up in my inbox.

I couldn’t help but think, “what did I do wrong?”, “what if I’m…


Dplyr is equivalent to the Pandas library in Python which enables easy data exploration and manipulation

Photo by Jeff Siepman on Unsplash

I started out my data science journey learning how to use the Pandas library and truthfully, there is everything to love about it — It is easy to use, straightforward and has functionalities for just about any tasks that involve manipulating and exploring a data frame.

Heck, I even made a full video series on YouTube teaching other people how to use Pandas. Feel free to check it out (shameless plug)!

However, lately, I find myself spending more and more time on R primarily because I am preparing for my actuarial exams but also I am curious to learn the…


Linear regression is one of the most fundamental knowledge in statistics, here’s how to perform and interpret it in R

Photo by Jean-Philippe Delberghe on Unsplash

It’s been a while since my last article on here and that’s because I have been busy preparing for my actuarial exam that is coming up in just two months. In the process of studying these past couple of weeks, I ran into a good old friend from way back in my first ever statistics class, linear regression.

As I started to learn more complex machine learning algorithms, I sometimes get caught up with building the fanciest model to solve a problem when in reality, a simple linear model could have easily gotten the job done. …


A beginner’s guide to the great and powerful k-means algorithm

Photo by Thibault Penin on Unsplash

In this article, we will discuss k-means clustering, an unsupervised learning algorithm and learn how to implement it in R.

Introduction to unsupervised learning and k-means clustering

First of all, what is unsupervised learning?

In contrast to supervised learning where the label (output) of a predictive model is explicitly specified in advance, unsupervised learning allows the algorithm to identify the clusters within the data itself and subsequently label them accordingly.

K-means clustering is an example of an unsupervised learning algorithm and it works as follows:

  1. Choose the number of clusters, K (this is what the k stands for in k-means clustering), which the data are to be…


MinMaxScaler vs StandardScaler vs RobustScaler

Photo by Stepan Babanin on Unsplash

Feature scaling is the process of normalising the range of features in a dataset.

Real-world datasets often contain features that are varying in degrees of magnitude, range and units. Therefore, in order for machine learning models to interpret these features on the same scale, we need to perform feature scaling.

In the world of science, we all know the importance of comparing apples to apples and yet many people, especially beginners, have a tendency to overlook feature scaling as part of their data preprocessing for machine learning. …

Jason Chong

Actuarial Science Graduate & Data Scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store