Posts

Loading data with data types When reading static files into R or Python, most of the times we are lazy as we load the data with no regard to the data types. But in mission critical ETL jobs or Data analytics workflow, data types are quintessential and there’s a fine line between life and death. Ok, I’m exaggerating here. What I’ve written below is a swiss army knife function to read an excel file: 1st tab is data and 2nd tab is the variable types (e.

CONTINUE READING

Describing unsupervised learning clusters As a data scientist / analyst, besides doing cool modelling stuff, we’re often asked to churn out descriptive statistics. Yes, we know. It’s part of the process. I chanced upon this really nifty concept at work to describe the clusters derived from unsupervised learnig. Here’s how it goes, Say it’s a nominal or ordinal variable. First, I find the proportion of the feature across the X clusters Second, I rank this proportion through percentiles across these X values The cluster with the highest percentile will earn its right to be represented by the feature And if it’s a scale variable, you may find the mean of the feature for each cluster and repeat the steps.

CONTINUE READING

Google Place API I was playing around with the API to obtain lat-long for my geo analytics work. I entered my credit card info but it seems that I’m not charged even with 9000+ API calls. Unsure if it’s because I’ve a 400+ dollars free cloud credit? Anyway, what I did here was to make API calls and storing the data into my local database. If you’re interested, you may visit this stackoverflow link (https://stackoverflow.

CONTINUE READING

Simulating product failures I’m inspired by this post here (http://www.programmingr.com/examples/neat-tricks/sample-r-function/rexp/). And decided to expand on the example. Say you are an owner of a computer store and you would like to estimate the frequency of warranty repairs - and the ensuing costs. Here’s the scenario with the accompanying assumptions Each computer is expected to last an average of 7 years You only sell 1000 computers at the start of each year You sell computer from 2019 to 2025 First, I simulate an exponential distribution of 1000 points for 7 years; and place a time index of 2019 to 2025

CONTINUE READING

Q learning I just completed a Reinforcement Learning assignment - in particular on Q-learning. According to Wikipedia here, it’s a model-free Rl algorithm. The goal for the algo is to learn a policy, which tells an agent what action to take under different circumstances. Here’s my confession. What I’m doing in this post is to summarise what I’ve just learnt so that I may come back to this at any point in future.

CONTINUE READING

What’re the returns (XIRR) for my CPFIS Portfolio? Every employee in Singapore is bounded by the same set of CPF rules. As an ex-economist/ data geek who doesn’t shy away from having skin in the game. I asked myself this question back in 2015 when I was still a starry-eyed young man 2 years into the workforce - how do I set out to optimize my returns in my CPF OA with these given set of constraints,

CONTINUE READING

Following the steps here –> https://realpython.com/flask-by-example-part-1-project-setup/ I managed to deploy my python flask app in Heroku. from flask import Flask app = Flask(__name__) @app.route('/') def hello(): return "Hello World!" @app.route('/<name>') def hello_name(name): return "Hello {}!".format(name) if __name__ == '__main__': app.run() You may visit the following link –>https://jirong-stage.herokuapp.com/ & add a suffix to it. Example https://jirong-stage.herokuapp.com/jirong & this will return Hello jirong! Possibilites are immense! I can easily create APIs or host dashboard here.

CONTINUE READING

Sampling with replacement Hello! It’s me once again attempting to explain things from first principles - a term popularized by Elon Musk. I will use some psudeo code - on sampling with replacement for weights - to aid my explanation. Earlier in the week, I attempted to write a simple function from scratch but I gave up after realising that it will take me more than 15 mins! Difficulties lies in the multiple switch statements in defining the intervals.

CONTINUE READING

Building a decision tree from scratch Sometimes to truly understand and internalise an algorithm, it’s always useful to build from scratch. Rather than relying on a module or library written by someone else. I’m fortunate to be given the chance to do it in 1 of my assignments for decision trees. From this exercise, I had to rely on my knowledge on recursion, binary trees (in-order traversal) and object oriented programming.

CONTINUE READING

Martingale Strategy In this post, I will simulate a martingale strategy in Roulette’s context to highlight the potential risks associated with this strategy. Double down! That’s essentially the essence of it. Here’s a simple explanation of the strategy, The croupier spins the ball. If it’s red you win the amount you bet, black you lose the same amount If you win, you continue to bet the same amount (same as your 1st bet amount) If you lose, you double your bet amount And if your accumulated winnings hits a certain amount, you stop and leave the casino So how would the strategy fare?

CONTINUE READING