Placeholder Image

Subtitles section Play video

  • For many years, data science has been called the sexiest job of the 21st century.

  • But in recent years, it seems like there's a new job vying for that title, the AI engineer.

  • So who even are these new kids on the block?

  • Are they just data scientists in disguise?

  • What's up y'all? I'm Isaac Key, and I'm a former data scientist turned AI engineer at IBM.

  • To answer these questions,

  • I'm going to lay out four key areas in which the work of a data scientist differs from an AI engineer, specifically a generative AI engineer.

  • But before I dive into these differences, we first have to understand more about what's happening in the industry.

  • So traditionally, data scientists have always used AI models to do their analysis.

  • So what's changed? Well, with the advent of generative AI, the boundaries of what AI can do are being pushed in ways that we've never seen before.

  • So these breakthroughs have been so groundbreaking, that generative AI has split off into its own distinct field, and we call that AI engineering.

  • Okay. So now that we understand the landscape, let's dive into the differences.

  • The first area of difference lies in the use cases.

  • So at a very high level, think of a data scientist as a data storyteller.

  • They take massive amounts of messy real-world data, and they use mathematical models to translate this data into insights.

  • On the other hand, think of an AI engineer as an AI system builder.

  • They use foundation models to build generative AI systems that help to transform business processes.

  • So since data scientists are fantastic storytellers, they use a lot of descriptive analytics to describe the past.

  • One example of this is through what's called Exploratory Data Analysis or EDA, which is all about graphing the data and doing statistical inference.

  • They can also do this through what's called clustering, which group similar data points based off of similar characteristics such as say doing customer segmentation.

  • Now, every good story has a reader trying to figure out what's going to come next, and that's where predictive use cases comes in.

  • As opposed to a book, however, a data scientist does not have the end already written, so they have to use what are called machine learning models to make their predictions.

  • An example of this is called regression models, which predict a numeric value such as say a temperature or revenue.

  • Another type of these models are classification models, which predict a categorical value such as a success or a failure.

  • So putting on the AI engineering hat now, one of the main use cases that AI engineers work on are called prescriptive use cases, which are all about choosing the best course of action.

  • An example of this is a technique called decision optimization, which enables businesses to assess a set of possible actions and then choose the most optimal path based off a set of requirements or standards.

  • Another example of a prescriptive use case is through creating what are called recommendation engines.

  • As an example, this can involve suggesting targeted marketing campaigns for a select customer base.

  • In addition to prescriptive use cases, there are also generative use cases, hence the name generative AI.

  • Now, foundation models, which I will touch on more in a bit, enable the creation of what are called intelligent assistants.

  • For example, a coding assistant or a digital advisor.

  • They also enable the creation of chatbots, as an example.

  • Which enable conversational search through information retrieval and the summarization of various content.

  • So after we have a use case identified, we need data.

  • Now, people say that data is a new oil because like oil, you have to search for and find the right data and then use the right processes to transform it into various products, which then power various processes.

  • For a data scientist, the oil of choice is often structured data, aka tabular data.

  • Do note that data scientists still work with unstructured data, but not as much as AI engineers.

  • Now, these tables are often in the order of hundreds to hundreds of thousands of observations.

  • They require a lot of cleaning and pre-processing before the data can be modeled.

  • Some of the cleaning involved, for example, involves removing outliers or joining and filtering on a new table or even creating new features altogether.

  • This clean data is then used to train various machine learning models.

  • Now, on the other hand, an AI engineer, for them, the oil of choice is mainly unstructured data, such as text, images, videos, audio files, etc.

  • Let's take a text-based foundation model called an LLM or large language model as an example.

  • These models require anywhere between billions to trillions of tokens of text to be trained on, which is a lot larger scale compared to traditional machine learning models.

  • This leads me to the next area of difference, which is the underlying models.

  • So the data science toolbox consists of hundreds of different models and different algorithms that they can choose from.

  • Due to the nature of these models, each different use case requires gathering a different data set, and thus requires training a different model.

  • So as a result, the scope of these individual models is a lot more narrow, meaning that it's harder for them to generalize past the domain of data that they've been trained on.

  • Generally speaking, these models are a lot smaller in size in terms of the number of parameters.

  • They take less compute power to train and do inference, and they require less time to train, anywhere between seconds to hours.

  • Now, on the other hand, the generative AI toolbox is a lot less cluttered, and it really only contains one type of model, and that is called the foundation model.

  • Now, foundation models are revolutionary because they allow for one single type of model to generalize to a wide range of tasks without having to be retrained.

  • Thus, their scope is called more wide.

  • Due to the sophistication of these models, they are a lot larger in size, often billions of parameters.

  • They require a lot more compute power to train.

  • We're talking hundreds to thousands of GPUs, and they require a lot more training time.

  • Now, we're talking anywhere between weeks to months.

  • Due to the differences in the intrinsic nature between traditional machine learning models and foundation models, this also means that the underlying processes and techniques that are used to develop solutions with these also differ.

  • So, a typical data science process will look something like this.

  • You start off with a use case, and then from that use case, you pick the right data.

  • Then, after that data is prepared, you use it to train and validate a model using techniques such as feature engineering, cross-validation, or hyperparameter tuning, as an example.

  • This model then is deployed at some endpoint, for example, in the Cloud to do real-time prediction and inference.

  • Now, on the other hand, the generative AI process also starts off with a use case, but then we can skip directly to working with a pre-trained model.

  • What makes this possible is a phenomenon called AI democratization, which is a big fancy word that simply means making AI more widely accessible to everyday users.

  • Some of the best foundation models out there are published to open source communities such as Hugging Face.

  • Since these models are so generalizable and so powerful out of the box, they make it easy for developers to get started.

  • AI engineers interact with these foundation models via natural language instructions to prompt them to do various tasks.

  • This process is known as prompt engineering.

  • Now, prompt engineering can be used in conjunction with different frameworks to then build larger AI systems.

  • An example of these frameworks include as one, chaining different prompts together or doing what's called parameter-efficient fine-tuning or PEFT on domain-specific data, or doing retrieval augmented generation, aka RAG, to ground answers and truth, or even by creating autonomous agents to reason through very complex multi-step problems.

  • So these are just a few of the examples of the building blocks that can be used to build larger AI applications.

  • The last step is to then embed the AI in a larger system or workflow.

  • This can take on the form of creating assistants or virtual agents, building a larger application with a UI, or even doing some sort of automation.

  • So, okay, let's take a step back and let's look at all the differences at a very high level.

  • As we can see, the breakthroughs in generative AI underpin many of the differences in the use cases, data, models, and processes that data scientists and AI engineers work on.

  • It's important to note that there is still overlap between the two fields.

  • For example, data scientists will still work on prescriptive use cases or an AI engineer will still work with structured data.

  • Regardless of these differences, both of these fields are continuing to evolve at a blazing fast pace with new research papers, new models, new tools coming out every single day.

  • With data, AI, and a creative mind, really anything is possible with these.

  • Thank you for tuning in. I hope this was helpful.

  • Until next time, peace.

  • If you like this video and want to see more like it, please like and subscribe.

  • If you have any questions or want to share your thoughts about this topic, please leave a comment below.

For many years, data science has been called the sexiest job of the 21st century.

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it

B1 US data ai generative data scientist engineer model

Data Scientist vs. AI Engineer

  • 83 3
    松崎洋介 posted on 2024/09/06
Video vocabulary

Keywords

massive

US /ˈmæsɪv/

UK /ˈmæsɪv/

  • adjective
  • Very big; large; too big
  • Extensive in scale or scope.
  • Solid and heavy.
  • Exceptionally large; huge.
  • Large or imposing in scale or scope.
process

US /ˈprɑsˌɛs, ˈproˌsɛs/

UK /prə'ses/

  • verb
  • To organize and use data in a computer
  • To deal with official forms in the way required
  • To prepare by treating something in a certain way
  • To adopt a set of actions that produce a result
  • To convert by putting something through a machine
  • noun
  • A series of actions or steps taken in order to achieve a particular end.
  • A summons or writ to appear in court or before a judicial officer.
  • A systematic series of actions directed to some end
  • Dealing with official forms in the way required
  • Set of changes that occur slowly and naturally
  • A series of actions or steps taken in order to achieve a particular end.
  • other
  • To perform a series of operations on (data) by a computer.
  • To deal with (something) according to a particular procedure.
  • Deal with (something) according to a set procedure.
  • To perform a series of mechanical or chemical operations on (something) in order to change or preserve it.
  • To perform a series of mechanical or chemical operations on (something) in order to change or preserve it.
  • Take (something) into the mind and understand it fully.
  • other
  • Deal with (something, especially unpleasant or difficult) psychologically in order to come to terms with it.
technique

US /tɛkˈnik/

UK /tekˈni:k/

  • noun
  • Way of doing by using special knowledge or skill
  • The manner and ability with which an artist employs the technical skills of a particular art or field of endeavor.
  • A way of doing something, especially a skilled one.
  • A skillful or efficient way of doing or achieving something.
  • The skill or ability to do something well.
scale

US /skel/

UK /skeɪl/

  • noun
  • Size, level, or amount when compared
  • Small hard plates that cover the body of fish
  • Device that is used to weigh a person or thing
  • An instrument for weighing.
  • A sequence of musical notes in ascending or descending order.
  • Range of numbers from the lowest to the highest
  • The relative size or extent of something.
  • Dimensions or size of something
  • verb
  • To adjust the size or extent of something proportionally.
  • To change the size of but keep the proportions
  • To climb something large (e.g. a mountain)
  • To climb up or over (something high and steep).
  • To remove the scales of a fish
structure

US /ˈstrʌk.tʃɚ/

UK /ˈstrʌk.tʃə/

  • noun
  • The way in which the parts of a system or object are arranged or organized, or a system arranged in this way
  • The arrangement of and relations between the parts or elements of something complex.
  • A building or other man-made object.
  • The way in which the parts of a system or organization are arranged.
  • verb
  • To plan, organize, or arrange the parts of something
  • other
  • To construct or organize something.
develop

US /dɪˈvɛləp/

UK /dɪ'veləp/

  • verb
  • To explain something in steps and in detail
  • To create or think of something
  • To grow bigger, more complex, or more advanced
  • To make a photograph from film
  • other
  • To invent something or cause something to exist
  • To start to suffer from an illness or other medical condition
  • To improve the quality, strength, or usefulness of something
  • other
  • To (cause something to) grow or change into a more advanced, larger, or stronger form
interact

US /ˌɪntɚˈækt/

UK /ˌɪntər'ækt/

  • verb
  • To talk or do things with each other
  • other
  • To communicate or work together.
complex

US /kəmˈplɛks, ˈkɑmˌplɛks/

UK /'kɒmpleks/

  • noun
  • Group of buildings all used for the same purpose
  • Psychological issue regarding self-image
  • adjective
  • Not being simple; having many parts or aspects
research

US /rɪˈsɚtʃ, ˈriˌsɚtʃ/

UK /rɪ'sɜ:tʃ/

  • noun
  • Study done to discover new ideas and facts
  • A particular area or topic of study.
  • A department or group within an organization dedicated to conducting research.
  • A detailed report of the results of a study.
  • verb
  • To study in order to discover new ideas and facts
  • other
  • A particular area or topic of academic study or investigation.
  • The work devoted to a particular study.
  • Systematic investigation into a subject in order to discover or revise facts, theories, applications, etc.
  • The systematic gathering, recording, and analysis of data about issues relating to marketing products and services.
  • other
  • Systematic investigation to establish facts or collect information on a subject.
  • other
  • To study the market relating to marketing products and services.
  • To study (a subject) in detail, especially in order to discover new information or reach a new understanding.
  • other
  • To carry out academic or scientific research.
predict

US /prɪˈdɪkt/

UK /prɪ'dɪkt/

  • verb
  • To guess or estimate what will or might happen
  • other
  • To say or estimate that (a specified thing) will happen in the future or will be a consequence of something.