R
Fuzzy matching As a computer scientist graduate, I always strive to reduce my computational complexity through parallelization or vectorization!
Explicit loops in data science is the root of evil!
For loops & while loops have their places but definitely not in data science space (fairly broad statement here).
In this post here, I hope to show a really cool example that avoids the dreaded O(n square) complexity.
I will be using fuzzy matching to find the closet match of strings in data-frame 2, df2 against data-frame 1, df1.
Converting NAs to obscure numbers to prevent the data from messing up the recoding. 1 issue that I encounter while I data-munge is that NAs in data seem to mess up my recoding. Here’s a neat swiss army knife utility function I developed recently.
suppressMessages(library(dplyr)) # Converting NA to obscure number to prevent awkward recoding situations that require & !is.na(<variable>) # Doesn't work for factors #' @title Convert NA to obscure number #' @param dp_dataframe Dataframe in consideration #' @param np_obscure_num Numeric - Obscure number #' @param bp_na_to_num Boolean if TRUE, convert NA to num.
Decomposing a Position Into Exchange Rate and Non Exchange Rate Effects If you are someone with a stake in foreign positions, this package I wrote here may be a useful tool to help you understand the impact of foreign currency on your positions. For instance,
If you are an investor, you may use it to analyze impact of exchange rate on your investment positions. If you are in the treasury department, you may wish to analyze the impact of exchange rates on your bonds.
Shift-share Analysis Package I developed During my career, I often have to deal with compositional & within group effects. For instance, the employment rate fell by 3% across 2 period. How much of it is due to an increase in employment rate within the sub-group and how much of it is due to compositional shift (for example ageing population).
A formal way to explain these effects is known as shift-share analysis.