home

Welcome to 'The Rabbit Hole'!

about thinking jobs

Mood: Hungry

This is the first of hopefully many additions to this blog, and in general to the parent site. The purpose of this blog is to record a journal-style approach to my dive into various topics within the world of statistical learning and data science. I chose the name “The Rabbit Hole” because I feel like every time I get materially invested in a data-related project (as opposed to simply being expected to participate XD) the space of exploratory options scales exponentially as the amount of data (and options for modeling/visualizing it) increases.

In my experience the term “data science” gets tossed around a lot across various disciplines, so therefore its meaning has evolved to be quite variable; however, I still like it because it seems to do a good job at encapsulating a broad set of problems and problem-solving skills that are clearly inter-related, yet in many cases distinct from what I learned in my formal education (B.S./M.S. in Statistics). I admit I’m not so much a fan of the job title, since after sampling a couple dozen job descriptions with “Data Scientist” in the title I found a significant level of heterogeneity across the lot. Sometimes I see job roles as classes in an RPG game: there is inevitable overlap (e.g., Rangers and Wizards could both know some of the same spells) but usually well-defined roles that one can’t do for the other in most scenarios. But with “Data Scientists” I feel like there’s some sort of base class - “Data Person/Professional” I guess? - and each company desires a unique sub-class that is encapsulated by the “Scientist” aspect. But there’s almost not enough overlap to be considered the same thing, you know?

Wizard and Ranger

Here are a couple examples (required and preferred skills for role) I found when I pored over new openings on LinkedIn:

  1. Deloitte
    • Machine Learning
    • Artifical Intelligence
    • Python
    • SQL
    • Data Visualization
    • MLOps Experience
    • LLMs and RAG
  2. Stripe
    • Machine Learning
    • Optimization
    • Causal Inference
    • SQL
    • Python
    • Tableau or PowerBI
    • Experience with Salesforce
  3. Amazon
    • Python
    • Optimization
    • Computer Vision
    • NLP
    • Deep Learning
    • Java or C++
    • Patent or Publication at Top-Tier Peer-reviwed Conference or Journal
  4. Microsoft
    • Python
    • SQL
    • C#
    • Time Series Analysis
    • Managing Unstructured Data
    • Bayesian Inference
    • Productionizing Models
  5. Meta
    • SQL
    • Python
    • R
    • Experimental Design
    • Understanding Ecosystems
    • Survey Sampling Methods
    • Quasi-experimental Methods

So I mean, it’s clear there is overlap but when I was coming out of school this sort of huge breadth of requested skills and knowledge made me feel very tiny, and that I was not fit for such roles. So I backed away slowly from this space that felt too daunting… eventually I got a role in Model Risk at a mid-sized bank. I’m very grateful for the job I have and what I’ve learned there; but, in hindsight, it would have been nice to have a mentor or someone point out that only a few of these “required data science skills” appear to be (nearly) universal. The rest are either specializations, company-specific needs, or not evening required - just preferred.

Coming from a traditional statistics background - especially from BYU’s program at that time - my specialty was never going to be low-level, optimized computing for deep neural network architectures, or MLOps, or productionizing models. But I received a strong mathematical and statistical foundation, became an expert in R, gained some experience with Python and SQL, and co-authored an academic article published in a peer-reviewed journal. I had great capabilities, but yet I had convinced myself it still wasn’t enough for big bad Data Science.

Now five years later I’m hoping to use my experience and elevated (tiny bit, not too much) maturity to piece together more data-related skills and showcase them. Firstly, I just enjoy learning; but I also want to be prepared for any great opportunity that might come my way to take on a data science role in the future. They (1) typically pay more :) and (2) provide more opportunities to jump into The Rabbit Hole. This roadmap may change, but I think my “main quest” is:

Demonstrate that I can build data pipelines, fit highly performant predictive models, and present my findings through effective reporting and visualization

Consequently, my current approach involves independent study across the following skillsets that I’m lacking:

  1. Become a Python and SQL Expert: I’ve sat behind my love for R for a bit too long. R is not bad, and is widely used, so I’m happy I am good with it. But my SQL skills are way below what I want them to be, and there’s no denying that Python is the go-to language for a majority of modern data analytics, deep learning, and data processing. Even if I end up just using R for the rest of my career, it will be invaluable to master these two programming languages.

  2. Build, Build, Build: Since my current job doesn’t offer excessive opportunity for creative model-building, or least in the sense that I can demonstrate that I know how, I will be posting specific modeling projects to my GitHub Pages site as a porfolio.

  3. Low-level Language: I feel a need to become proficient to a higher degree in at least one low-level language. C++, Scala, and C# are all good options, but I haven’t “decided”. I feel like C++ would be considered the best overall option, but for some reason I want to learn Scala - and there’s some merit to that given that Spark is written in Scala and it is very similar to Java. Kotlin runs on the Java VM as well, and is used for Android app development so there might be some cross-over (?) IDK

  4. Data Visualization: I’m good with custom on-the-fly visualizations in R, but I feel like I need to expand the scope of my skills to figure out how to do all those same things in Python AND learn (a) Tableau or PowerBI basics and (b) a versatile framework like D3.js.

  5. Machine Learning & Deep Learning: Most of my experience is in generalized linear models, tree-based learners, and some Bayesian approaches. This is good, but many roles will require expertise in neural networks and scaling what I already know to large datasets. This will involve becoming familiar with parallelization techniques, cloud frameworks, and deep learning libraries like TensorFlow and/or PyTorch. I’ve already gotten my feet wet with PyTorch, which will likely be the special guest for my next blog post (so come back for it!).

Anyways, thanks for reading and I’m hoping to get these posts out on at minimum a weekly basis (maybe every Friday or Sunday). It would be really cool to look back in two years on 100+ blog posts documenting my evolution to that point. There is also a chance that I take what I learn along the way and develop a free-lance consulting gig or something - you never know what the future holds! Much to the chagrin of many “Data Scientists” :)

© 2025 Spencer Newcomb   •  Powered by Soopr   •  Theme  Moonwalk