My top 5 takeaways from Build a Career in Data Science (Getting started with data science) – PART 1
Posted on: February 25, 2024
Post Category: Book Notes
‘Build a Career in Data Science’ (by Emily Robinson and Jacqueline Nolis) is one of the most consequential books I’ve read for my growth in the data analytics space.
It helped me write my resumes. It helped me write my cover letters. And it helped me showcase the personal projects I’ve built without having enough real-world experience.
I owe a lot to this book.
While I haven’t read through all of it (since the book is sectioned by different stages of building your career), I decided to get back into it: remember the basics and read what’s left for me, to settle and grow.
But for this particular blog post, I’ll be focussing on the first part of the book: Getting started with data science.
Pretty elementary stuff if you’re already quite familiar with data science… but there might be some interesting perspectives you’ll find here.
So here are the 5 key takeaways I took from this part of the guide:
1. The what of data science
Data science is the practice of using data to try and understand and solve real-world problems.
Enabled by computing, data scientists may help produce reports, dashboards or a deployable machine learning model.
For example, if a retail company wants to open up a new store, or increase online order sizes by recommending items to customers while they shop, a data scientist could get involved by doing analysis or building a model.
2. The required skills of a data scientist – and the different types of data scientist roles
Fundamentally, the core skills that make data science are the following:
- Domain knowledge / business understanding: knowing how to translate a business situation into a data question, finding the data answer, and delivering the business answer.
- Programming and databases: the ability to pull data from company databases and to write clean, efficient, maintainable code (mainly involving SQL, R and Python – and Git for version control).
- Maths and stats: this includes data literacy, knowing what techniques exist, how to apply the techniques and how to choose which techniques to try.
And to put it simply, different data science roles come out of different intersections of these skills:
- An analyst comes at the intersection of domain knowledge and programming/databases, typically creating dashboards/reports that deliver data.
- A machine learning engineer comes at the intersection of programming/databases and maths/stats, and they get to create models that run continuously
- A decision scientist comes at the intersection of maths/stats and domain knowledge, using analyses and experiments to produce recommendations.
Some other related jobs include:
- Business intelligence analyst: analysts with less statistical and programming expertise, where the output is less sophisticated but is a good entry point for data science.
- Data engineer: who are responsible for keeping data neatly stored and formatted in databases, ensuring people get what they need. Computer science / software engineering skills are helpful here.
- Research scientist: who develop and implement new tools, algorithms, and methodologies, often to be used by other data scientists within the company. These types of positions almost always require PhDs, usually in computer science, statistics, quantitative social science, or a related field.
3. The different data science companies you might come across – and the primary challenges/benefits of each
There are five archetypes for companies that hire data scientists:
- The massive tech company: the team you’re in is likely to be large and full of experienced people; people serve in many specialised roles across the company, and the tech stack will vary from team to team. Often involves solving big problems using cutting-edge technologies, but little freedom to make big decisions.
- The established retailer: data science teams are decently new, primarily doing reporting – not yet ready to operate their own models continuously. Often accompanied by a good amount of freedom – to build whatever might deliver value – and a decently old tech stack (because “that’s how it’s always been”).
- The early-stage startup: you may be the first data science hire; work can get chaotic, but you’ll get to learn lots of skills very quickly. Often accompanied by a brand new tech (but fragile) stack, no bureaucracy, and a lot of freedom.
- The late-stage, successful tech startup: data science is recognised at the company level, with a good number of data scientists around to provide support and mentorship. Often accompanied with a good amount of freedom and chances to learn.
- The giant government contractor: work is often slow, comfortable and secure – not ideal if you demand a challenge. Often accompanied with a lot of bureaucracy and an ancient tech stack.
All are different and are worth considering when building your career.
4. The different ways of attaining data science skills/knowledge
There are four main ways:
- Earning a graduate degree in data science or a related field
- Participating in a data science bootcamp
- Doing data science work in your current job
- Teaching yourself – through online courses and data science books
For earning a graduate degree:
- The upside is that you get the knowledge you need to get started as a junior data scientist – through coursework and projects involving statistics, programming and machine learning.
- The downside is that it’s that extremely expensive, not worthwhile if you have already have a background related to data science, and course content may not align with what’s used in the industry.
For bootcamps:
- The upside is that you learn an incredible amount in a short amount of time, typically going beyond technical skills, including opportunities to work on projects and network with people.
- The downside is that you may not be able to apply for jobs during your bootcamp, since it’s a large time commitment. This may mean you’ll probably be on the job market for several months after completing it. The cost is also a lot higher compared to being self-taught.
For doing data science work in your current job:
- Here, you leverage your relationships with people with data science-adjacent jobs, and be proactive in your approach when getting involved with more data science projects (without becoming a burden for someone else).
- The upside is that it’s self-motivated and saves the cost of going to a bootcamp or graduate program.
- The downside is that it depends on whether you have people around you who have the skills and can mentor you.
For being self-taught:
- Courses and books provide a good enough grounding for people to teach themselves.
- The downside is that there is little structure and it’s a lot more difficult showing your qualifications on a resume. This is the least-recommended way to become a data scientist.
5. The how-to for building your portfolio
A portfolio is a set of data science projects that you can show to people, so they can see what kind of data science work you can do.
It’s not necessary to have one but it’s a great way too help you stand out and to practice data science skills.
A portfolio project involves the following main components:
- A dataset followed by a question (or a question followed by a dataset)
- An analysis that aims to answer the question
- A result – shared/communicated through a GitHub repository and/or blog post
And these are few things to keep in mind while building projects:
- Browsing datasets can help. You can find inspiration from websites/databases, like Kaggle, news datasets, APIs, government open data and your own data.
- Use your project to force yourself to learn something new.
- Do not overscope. Set yourself one question and answer it, and if the analysis is something that could be revisited later, you can scope another project then.
- Include a filled-out README file in your GitHub repository, explaining the project and how the repository is organised.
- You can blog using your own website (which can be created using R blogdown) or a blogging platform like Medium.
- Blogs can be populated with posts about your experience, a fun project you did, or code or theory-intensive tutorials.
In the next part, you’ll learn how to prepare for applying for data science jobs.
Stay tuned (or check out my more recent posts, if it’s been shared already)!
About the author
Jason Khu is the creator of Data & Development Deep Dives and currently a Data Analyst at Quantium.