Adam Gorski: “Nowadays, basic understanding of data is almost a prerequisite in the business world.”
Adam Gorski started his career as Data Analyst for Oakland, CA public schools and grew to Senior Data Scientist at DocuSign, a company valued at $1.5 billion, in just 4 years.
With more than 5 years of experience in building data systems, Adam is extremely passionate about using data to solve problems in the private, public, and non-profit sectors.
During this interview, Adam has shared his vision of:
- What SQL is and why is it still relevant;
- The Data Scientist role;
- Data in business;
- How can SQL skills improve one's career;
- SQL course @ ELVTR.
I wouldn’t be surprised if we were still using SQL 20 years from now.
For those who aren’t coming from a data-centric space, what is SQL? Why is it so commonly used for business purposes?
[There are many ways to] structure data, but we are talking about Relational Databases (a means of storing data as relative points that are organized as a collection of tables), as the most common way to store data. And the way to access that data is SQL’s [operational language]. In fact, SQL is the most common language for querying databases in a business context.
The business world is becoming more and more driven by data. And so, understanding the most common language for accessing data is a good [method of] understanding of how data is structured, and how to use data to provide insights.
For example, at DocuSign, data has a massive impact on everything we do on a day-to-day basis. All of our decisions are tied to metrics.
This is the analogy I like to use: you used to hire a Typist because it was a specialized skill, but now everyone can type. Nowadays, a basic understanding of how data is stored, how to query it, and how to derive insights from it is almost a prerequisite in the business context.
SQL [remains popular because it] provides users with direct access to data, meaning that there's going to be less friction between the user and whatever insights they’re trying to derive.
You’ve said that there are many ways to structure data. If there are newer methods for storing data, why is SQL still relevant?
[SQL is 50 – 60 years old], [and] I would say that its age speaks to how robust the language is. SQL has gone through many permutations and iterations. There are various types of SQL databases (MySQL, MariaDB, Oracle, Azure ETC) people call them flavors.
However, the fact that [business professionals] are still using it on a regular basis, in a business context, I think that speaks to the continued relevance.
In the same way, we may feel that the QWERTY keyboard might not be ideal, but it remains an aspect of our daily life. [In that way], I wouldn’t be surprised if we were still using SQL 20 years from now. SQL is entrenched in all industries.
80 to 90% of data work is getting and cleaning data.
Tell us about your journey of becoming a Data Scientist.
Initially [I was] interested in public policy [surrounding] education and education [industry] data. So, I went to school and studied Data Analysis and Educational Policy. After that, I worked for an education nonprofit called Great Schools. That was my first real introduction to working with big data at scale.
[The aforementioned data] needed to arrive in a dynamic way, we had to pull it from all 50 states, plus DC, on a regular basis. That data needed to be processed. We had to surface it to our millions of users, which really got me interested in data analysis and the power of data.
From there, I moved into space where I was doing what's called Data Science, which has to do with building models and running experiments. And now, I work as a Data Scientist at DocuSign, where I build models
[To do so], I extract features (the most valuable data traits), and surface Business Insights.
What does a typical day as a Data Scientist look like? Are there any misconceptions surrounding the field?
[Often, students come to class] and expect the data [to] be in the shape that it needs to be to do their analysis. They're all excited to do all kinds of fancy models, but the bulk of the work is curating data.
[The undeniable fact is] 80 to 90% of data work is getting and cleaning data. [in practice], data comes from disparate sources and needs to be merged. And to successfully curate data, duplicate entries must be avoided.
If you came from [an] academic environment, you [were most likely] given a data set that has been [stripped down to applicable figures] and cleaned by your instructors. By doing so, the instructors were shortchanging you by 80% of what you're going to be doing on a daily basis, because data is messy.
...it doesn't matter how crazy [or wrong] the theory is, I guarantee that a Data Scientist can find data to back it up.
Within the business context, who should have access to data?
The [belief] that access to data is a constraint is rapidly becoming less of a concern, even for [those] who don’t have the word "data" in their title.
And the reason for this is that it has become much easier to [replicate a database]. Through replication, businesses can provide access to those, who aren’t necessarily Engineers or Developers.
Nowadays, replicas should be created in order to provide access [to data], so that [everyone] can derive their own insights. [For example], At DocuSign, we're modeling what I'd call self-service data. Instead of someone coming to an Engineer and saying, “here's the specific data set that I need,” and then having an Engineer fetch that data for them, [...] we teach basic queries so that everyone can pull data themselves.
And then that way, [employees] have autonomy. They can perform analysis in a read-only state. At DocuSign, we often have to query read-only replicas, this is because our customers are accessing the production server.
[In general], companies are beginning to notice that the more people have access to information — the more insights individuals will glean for themselves and for the company. Because of this, the company becomes more robust and dynamic.
What can be achieved with an understanding of SQL’s operational language? What are the limitations?
[Say] you want to compare users who enrolled in May to [those] who [enrolled] in April. In order to do that, you need to monitor data on a daily basis. If you don't have access to the database — you'd have to contact Admin in order to pull that data and perform your own analysis. [However], if you [had] the skills and access to the database, [you could] perform that analysis yourself, every day.
[And], if you have the skills to write a line of SQL code, then essentially it's automated, every morning or every week, or however often you want to perform that analysis, you [would simply have to] run your code.
[Speaking of the power of data, it’s] important to understand what data can do, and its limitations. For example, if someone has already made up their mind and [searches for evidence to back up a theory], it doesn't matter how crazy [or wrong] the theory is, I guarantee that Data Scientist can find data to back up [the aforementioned theory], but that's not useful, that leads to making [bad decisions].
So, what we really need is a correct understanding of what data is and how to use it. I would say Data Scientists should think about data from a hypothesis perspective - [those who work with data] must develop hypotheses and then using data, validate or invalidate those hypotheses, rather than [forcing data] to backfill things they believe.
Knowledge of SQL takes your skillset and scales it up.
Many ask, how can the ability to access and understand data advance my career?
Generally, [employees] have advanced qualitative skills. If [they] can build upon those accessing and understanding data skills — they can turn that data into insights.
And data-derived insights open up all kinds of opportunities within the company. Understanding the role of data in the business context allows you to take the skills you already have and scale them.
So, what most companies, especially in the technology space, are interested in — is scalability. Knowledge of SQL takes your skillset and scales it up. [It] takes what you know and allows it to be applied much more broadly.
[Here’s an example]: there was a researcher who needed to fill a spreadsheet about the number of unique users, [...], if the researcher knew a bit about SQL queries that were used to generate those numbers — he could have slightly modified [the query] in order to get those numbers by himself.
[Knowledge of this operational language] would have made his work a lot faster. It would have allowed him to add additional characteristics and robustness to that data set that he was creating. [It] would have allowed him to understand the data better, rather than just being a number that appeared magically from nowhere.
[If the researcher] understood the use of SQL, he would have concluded that this data comes from this set of tables that are joined in this particular way, etc. And that [would] allow him to understand the data better. [And, understanding SQL would] allow him to derive more insights going forward.
How easy to learn SQL is, compared to other programming languages?
I think that SQL has a pretty low barrier to entry compared to most programming languages. And by that I mean, SQL is a language that[would] read [much] like text, if you were to read it. So you can actually infer the commands, [SQL] use English verbs.
For example, the most basic query that you would use in SQL often starts with the word “SELECT,” [signifying that] you're going to select these pieces from the data set. And so, [SQL is] relatively easy to read.
I think about [SQL as] analogous [to] website domains. Twenty years ago, “.com” didn't mean anything [to most people]. But nowadays, everybody knows exactly what that means. [This isn’t much] different [from a] programming language — they spread and become part of our work repertoire.
[All-in-all], SQL is very readable and [easy to learn]
In today’s business context, employees must be able to [...] not only answer questions that are posed, but also generate questions themselves.
Why create a SQL course?
One of the gaps that I continue to see in the education space is a gap in understanding how data is structured and how to query that data. [In fact], I never learned the importance of SQL until I was in the industry.
Moreover, I’ve seen first hand that business professionals understand that they need to be able to analyze data, but they are missing a critical piece [of the puzzle]: how to get that data – the means of accessing data required to answer the questions that you want to answer. And SQL is the missing puzzle piece.
In today’s business context, employees must be able to [...] not only answer questions that are posed, but also generate questions themselves. So, [a great deal] of work in the data space is providing questions that can be answered by data.
I built this course with ELVTR for people who already have a firm understanding of the business that they work in, but want to be able to understand and improve that business with more data insights.