programming

Regime detection through hidden markov model

Regime detection through hidden markov model It’s rumoured that in the early days of Renaissance Technologies - according to the book ‘The Man Who Solved the Market’ - hidden markov models are used for regime detection. Here I am, a couple of decades later - employing this strategy. This will be integrated into my ‘Jarvis’ - a series of Algorithmic toolkits that advises me in all situations. Hidden markov mode is a statistical unsupervised learning model used to model states.

Embedding D3 interactive charts part 2

Embedding D3 interactive charts Part 2 - Testing reading in of file from directory Just having fun - testing to see if I could embed D3 charts in my blog. Seems like it works too! But I would have to upload the csv under public folder first. // set the dimensions and margins of the graph var margin = {top: 10, right: 30, bottom: 30, left: 60}, width = 460 - margin.

Embedding D3 interactive charts

Embedding D3 interactive charts Just having fun - testing to see if I could embed D3 charts in my blog. Seems like it works! // set the dimensions and margins of the graph var margin = {top: 10, right: 30, bottom: 30, left: 60}, width = 460 - margin.left - margin.right, height = 450 - margin.top - margin.bottom; // append the svg object to the body of the page var svg = d3.

Fuzzy matching with many to many matches without loops

Fuzzy matching As a computer scientist graduate, I always strive to reduce my computational complexity through parallelization or vectorization! Explicit loops in data science is the root of evil! For loops & while loops have their places but definitely not in data science space (fairly broad statement here). In this post here, I hope to show a really cool example that avoids the dreaded O(n square) complexity. I will be using fuzzy matching to find the closet match of strings in data-frame 2, df2 against data-frame 1, df1.

Latest lessons learnt from crawling

Lessons learnt I just realised that there’s a quick way to understand the xpaths’ patterns. In the past, usually what I did is to manually eyeball to infer the patterns from the page source or inspect page. Silly me! 1 quick way to understand the pattern is through the following, Right click on an element in a web page that you are interested in and click on ‘inspect’ Right click on the node and click ‘copy’ Copy full xpath And paste to a notepad.

Asset allocation notification

Asset allocation notification I’m in the midst of automating/ guiding my life with algorithms (largely inspired by Ray Dalio) - and 1 of the guidelines that I set is on asset allocation, Emerging market and Developed Market should be of the same proportion Bonds + Cash proportion should be equivalent to my age. This can deviate in times of crisis when I want to be more opportunistic. If it deviates from the portfolio policy statement, it will send me a pushover notification to my phone:)

ETF watchlist email notification Through Python

Email notification I finally bit the bullet and updated my previously hideous email notification! You may find the updated email notification template here - alongside with the code. Feel free to ping me if you are keen to be on the email list too. ~ Jirong import smtplib, ssl import datetime import pandas as pd from email.mime.text import MIMEText from email.mime.multipart import MIMEMultipart #Format text data = pd.read_csv(‘/home/jirong/Desktop/github/ETF_watchlist/Output/yahoo_crawled_data.csv’) data[‘Change_fr_52_week_high’] = round(100 * data[‘Change_fr_52_week_high’], 1) data = data[[‘Name’, ‘Price’, ‘Change_fr_52_week_high’]].

Convert NAs to Obscure Number in Data Frame to Aid in Recoding/ Feature Engineering

Converting NAs to obscure numbers to prevent the data from messing up the recoding. 1 issue that I encounter while I data-munge is that NAs in data seem to mess up my recoding. Here’s a neat swiss army knife utility function I developed recently. suppressMessages(library(dplyr)) # Converting NA to obscure number to prevent awkward recoding situations that require & !is.na(<variable>) # Doesn't work for factors #' @title Convert NA to obscure number #' @param dp_dataframe Dataframe in consideration #' @param np_obscure_num Numeric - Obscure number #' @param bp_na_to_num Boolean if TRUE, convert NA to num.

Loading excel data with correct variable types

Loading data with data types When reading static files into R or Python, most of the times we are lazy as we load the data with no regard to the data types. But in mission critical ETL jobs or Data analytics workflow, data types are quintessential and there’s a fine line between life and death. Ok, I’m exaggerating here. What I’ve written below is a swiss army knife function to read an excel file: 1st tab is data and 2nd tab is the variable types (e.

Function to describe clusters derived from unsupervised learning

Describing unsupervised learning clusters As a data scientist / analyst, besides doing cool modelling stuff, we’re often asked to churn out descriptive statistics. Yes, we know. It’s part of the process. I chanced upon this really nifty concept at work to describe the clusters derived from unsupervised learnig. Here’s how it goes, Say it’s a nominal or ordinal variable. First, I find the proportion of the feature across the X clusters Second, I rank this proportion through percentiles across these X values The cluster with the highest percentile will earn its right to be represented by the feature And if it’s a scale variable, you may find the mean of the feature for each cluster and repeat the steps.