Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
100 views23 pages

State of Data Science and Machine Learning

state-of-data-science-and-machine-learning.pdf

Uploaded by

장윤석
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views23 pages

State of Data Science and Machine Learning

state-of-data-science-and-machine-learning.pdf

Uploaded by

장윤석
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

The State of Data Science and Machine Learning 1

DECODING THE DATA UNIVERSE:

The State of Data Science


and Machine Learning
Mike Leone, Principal Analyst
ENTERPRISE STRATEGY GROUP

Sepet mber 203

© 2023 TechTarget, Inc. All Rights Reserved. 231ÆC(


© 203 TechTarget, Inc. All Rights Reserved.
The State of Data Science and Machine Learning 2

Research Objectives
Several challenges are preventing organizations from successfully integrating machine learning (ML) models into their
software development lifecycle. Bridging the gap between different skill sets, handling complex and large data sets, managing
specialized hardware, and ensuring availability, scalability, and security in production collectively delay time to value and cause
organizational bottlenecks.

Due to the increasing interest in and complexity of machine learning projects, organizations need improved agility, efficiency,
and performance, with risk reduction through right-sized governance. Organizations recognize that they need clear data science
and machine learning strategies. As part of these strategies, MLOps can provide a structured and standardized approach to
developing, deploying, and maintaining ML models in production to see greater value. To gain further insight into these trends,
TechTarget’s Enterprise Strategy Group (ESG) surveyed 366 professionals at organizations in North America (US and Canada)
involved with data science and machine learning technologies and processes, including potential responsibility for strategizing,
evaluating, purchasing, building, and managing these technologies.

This study sought to:

Identify investment plans, objectives, and Determine how organizations are prioritizing
challenges of data science and machine solutions to best help them succeed.
learning initiatives and projects.

Understand the evolving stakeholder landscape,


Establish the current state of including team makeup, involvement, and
operationalizing AI through MLOps. growth opportunities.

© 2023 TechTarget, Inc. All Rights Reserved. C


The State of Data Science and Machine Learning 3

Investments Point to Focus Sharpens on Improving


Staggering Growth, But Early and Late Stages of Data
Challenges Loom Large Science Lifecycle

( 2(˛ 2(

findings
0Æ0ÆÆ

Organizations Improve Their Ability Data Science and Machine Learning


to Shift Models to Production But Become a Team Sport, With Vendors
Need Further Efficiencies Focused on Enabling All Stakeholders

2(˛ 2(¸
© 2023 TechTarget, Inc. All Rights Reserved.
© 2023 TechTarget, Inc. All Rights Reserved.
Investments
Point to
Staggering
Growth, But
Challenges
Loom Large
The State of Data Science and Machine Learning 5

Primary Business Objectives Point Inward


Improving operational efficiency continues to be the lynchpin to most business objectives driving data science and machine learning initiatives. It not only
empowers organizations to improve agility, cost-effectiveness, and customer centricity, but also lays the groundwork for sustainable growth and scale in an
increasingly data-driven world. Once operations are performing at optimal levels, organizations can focus more on other business imperatives. However, data
science and machine learning initiatives also are expected to improve product development, customer experience, risk management, and other areas.

| Primary business objectives of data science and machine learning initiatives.

66% 60% 52%


Improving operational Improving product Enhancing customer experience/
efficiency development and innovation improving customer satisfaction

49% 47% 43%


Improving risk Enhancing decision Identifying new business opportunities
management making and/or increasing revenue

© 2023 TechTarget, Inc. All Rights Reserved. 231ÆC(



The State of Data Science and Machine Learning 6

This heightened investment reflects an understanding that


Budgets Are on the Rise
data science not only enhances operational efficiency but also
enables informed decision making, predictive analytics, and Nearly all (92%) organizations saw a
innovative product development.” year-to-year increase in budget allocation

43+49+71J
for data science and machine learning
projects/initiatives. These budgets
| Change in budget for data science and machine learning projects/initiatives compared with previous year. are significant, with nearly one in four
organizations (24%) planning to invest
Stayed the same, at least $1 million in people, process,

7%
or technology in association with data
science and machine learning over the
next several years. This heightened
investment reflects an understanding
that data science not only enhances
Increased operational efficiency but also enables
Increased
somewhat, informed decision making, predictive
significantly,
analytics, and innovative product
13% 43% development. This financial support
emphasizes the pivotal role that data
science and machine learning play in
enabling the business to extract valuable
knowledge from vast and complex data
sets, propelling organizations toward
success in the digital age.

© 2023 TechTarget, Inc. All Rights Reserved. 231ÆC(


The State of Data Science and Machine Learning 7

Strategies Are Diverse When Prioritizing Data Science Projects


The willingness to sacrifice time to market and proceed with limited resources highlights the cautiously optimistic approach organizations are taking. They recognize they
can’t afford to wait but also that they must ensure robust model development, thorough testing, and accurate insights to avoid potential costly errors. This deliberate and
calculated approach can enhance long-term performance, reliability, and stakeholder confidence, which far outweigh the initial time investment.

| Prioritized approach to data science-related projects.

23+77+S 23+77+S
23% 23%
Business impact Technical complexity
(i.e., projects with highest (i.e., projects with highest
potential business impact) technical complexity)

7+93+S 7%
Time to market
(i.e., projects with shortest time
to market) 13+87+S 13%
Resource availability
(i.e., projects that can be completed
with available resources)
88%
of organizations agree

14+86+S 19+81+S
that open source is
14% 19% critical to innovation
Customer feedback Executive leadership
(i.e., projects that address (i.e., priorities are dictated by the in data science and
executive leadership team)
machine learning.
customer feedback)

© 2023 TechTarget, Inc. All Rights Reserved. 231ÆC(


The State of Data Science and Machine Learning 8
| Areas used to measure data science projects/initiatives.

The Art of Measuring


Data Science Project 53% 48% 45%
Impact Improved operational
efficiency
Customer satisfaction Cost savings or revenue
generation

Each data science project brings


a distinct dimension to measuring
impact. The proximity of responses
is a testament to the diversity of
approaches and use cases that
highlight the transformative power
of data science across domains. 39% 37% 37%
Because operational efficiency is Time savings Competitive advantage Predictive accuracy
the most common business driver
for data science initiatives, it follows
that it is also the most common area
measured to ensure the performance
of those strategies. Customer
satisfaction and cost saving are also
commonly monitored to determine
the impact of these initiatives.
36% 35% 26%
Innovation potential Employee satisfaction/ Social impact
happiness

© 2023 TechTarget, Inc. All Rights Reserved. 231ÆC(


The State of Data Science and Machine Learning 9

Challenges Loom Large


|Mostsigni…cantchallengesfacedindevelopingandimplementingdatascienceprojects.
Most significant challenges faced in developing and implementing data science
Nearly all (94%) organizations projects.
face challenges in developing and
implementing data science projects. Lack of skilled talent 27%

Insufficient integration with existing systems 25%


Challenges come in several
Limited budget and resources 23%
shapes and sizes:
Lack of data access 22%
Organizational:
skilled talent, budgets, Difficulty measuring project outcomes 21%
defining objectives, and
Insufficient data security and privacy 21%
measuring outcomes.
Limited availability of the right tools 20%
Data/environment: Difficulty defining project objectives 19%
integrating with existing systems,
data accessibility, limited tools, Poor data quality 16%
poor data quality, and siloed data.
Siloed data 16%

Trust: Ethical concerns 16%


data security/privacy,
Ineffective data governance 14%
ethical concerns, and
data governance.
We don’t have any challenges 6%

© 2023 TechTarget, Inc. All Rights Reserved. 231ÆC(


Focus Sharpens
on Improving
Early and Late
Stages of Data
Science Lifecycle
The State of Data Science and Machine Learning 11

Most important factors when considering purchases to support data science


| Most important factors when considering purchases to support data science initiatives.
initiatives.

Integration with existing systems 34% Factors Weighed in


Ease of implementation and deployment 33% Consideration of Data Science
Purchases Highlight a Desire
Compatibility with open source technologies 26% for Integration and Simplicity
Alignment with the organization's strategic goals and vision 24%
Many organizations have already
Customer service and responsiveness 23% made massive investments in their
data science and machine learning
Availability of training and resources 22% initiatives, so ensuring they still see
value from those investments is
Availability of a strong community and ecosystem 21% critical. Simplifying implementation
and deployment highlights the desire
Industry-specific presence 21% for organizations to ramp up quickly
and improve the time between data
User adoption and engagement 20% generation and data insights. Note also
that over a quarter (26%) of organizations
Vendor stability and financial viability 19% consider compatibility with open source
technologies, likely foreshadowing a
Overall reputation of the vendor 18% larger open source deployment trend
moving forward.
Customer case studies and proof points 16%

Partner ecosystem 12%


© 2023 TechTarget, Inc. All Rights Reserved. 231ÆC(
The State of Data Science and Machine Learning 12

Significant Room for Improvement Moving Models to Production


Within the last year, organizations have made great strides in improving the operationalization of machine learning models and transitioning them into production
environments. Between robust frameworks and automated pipelines for model training, validation, and deployment, the industry has seen more seamless
integration into existing systems, as well as streamlined processes that enable faster iterations. At the root of this improved success is the advent of MLOps
practices to promote collaboration between data and IT stakeholders. However, despite these improvements, there is still significant room for improvement in the
rate at which organizations deploy machine learning models into production environments. For example, 45% of organizations see less than 25% of their models
make it into production. Challenges persist that require ongoing attention in managing the entire lifecycle of models, from initial development through continuous
monitoring and maintenance to deal with model drift, performance degradation, interpretability issues, and more.

Percentage
| Percentage of machine learning models deployed of machine
into production learning models deployed into production environments.
environments.

33%

26%

17%
15%

2% 3%
1% 4%
Less than 5% 5% to 10% 11% to 25% 26% to 50% 51% to 75% More than 75% We have not yet Don’t know
deployed an ML
model into production
© 2023 TechTarget, Inc. All Rights Reserved. 231ÆC(
The State of Data Science and Machine Learning 13

The Importance of Data Cannot Be Overstated


Data accessibility and data preparation go hand in hand. Data accessibility forms the foundation for the entire data science lifecycle, highlighting not only why
this is most commonly performed on a regular basis but also why it poses the largest challenge for organizations today. Data preparation, including cleansing,
structuring, and transforming data, is a necessary step to ensure that subsequent analytical experiments are founded on a reliable and accurate basis.

| Data science lifecycle steps performed on a regular basis. Most challenging data science lifecycle steps.
Data science lifecycle steps performed on a regular basis. Most challenging data science lifecycle steps.

Data access 51% Data access 15%

Data preparation 50% Data preparation 14%

Exploratory data analysis 12%


Model monitoring and maintenance 40%
Model monitoring and maintenance 11%
Exploratory data analysis 36%
Model development/feature engineering 11%
Model development/feature engineering 35%
Model interpretation and communication 9%
Model validation 35%
Model deployment 8%
Model retraining 32%
Model validation 6%
Model deployment 32% Problem formulation 6%

Model interpretation and communication 31% Model retraining 5%

Problem formulation 28% No steps cause challenges 3%

© 2023 TechTarget, Inc. All Rights Reserved. 231ÆC(


Organizations
Improve Their
Ability to Shift
Models to
Production But
Need Further
Efficiencies
The State of Data Science and Machine Learning 15

Unpacking Challenges in ML Deployment and Monitoring


Considering 58% of organizations have significant room to improve on their processes for moving models into production, it makes sense that even the most mature
organizations run into challenges. Technical complexities arise when integrating models into existing infrastructure, ensuring compatibility with various systems,
and encountering unexpected real-world data variability. Compliance and governance challenges impact reliability and trust as well as introduce risk. Operational
complexities arise such as maintaining model performance over time and identifying/responding to failures. Continuous monitoring also poses challenges, such as
addressing data drift and managing model dependencies such as model versioning.

| Challenges with deployment and monitoring of machine learning models.

35% 33% 33% 29%


Difficulty managing multiple Difficulty ensuring compliance with Difficulty detecting and
Inconsistent model performance
environments corporate governance policies responding to data drift
in production

29% 26% 26%


Difficulty detecting and responding Inefficient retaining Difficulty managing
to model failures processes dependencies

© 2023 TechTarget, Inc. All Rights Reserved. 231ÆC(


The State of Data Science and Machine Learning 16


Striking a Balance Between
A well-defined strategy to model monitoring Retraining and Maintaining
and maintenance that factors in benefits, costs,
With 47% of organizations retraining models on at least a
and impact is essential to making to the right weekly basis, it is important to understand the impact frequent
decisions about the optimal retraining schedule.” retraining can have on an organization, from resource strain and
inefficiency to amplifying data noise and creating versioning
complexities. While making changes via retraining based on data
drift is important, doing so excessively can disrupt operations,
| Frequency of retraining machine learning models in production. confuse users, and hinder strategic focus on critical deployment
Frequency of retraining machine learningaspects
models in likeproduction.
monitoring and ethics. Organizations must balance
retraining frequency and the potential downsides associated with
it. A well-defined strategy to model monitoring and maintenance
36% that factors in benefits, costs, and impact is essential to making
to the right decisions about the optimal retraining schedule.

23% 22%

11%

2% 3
1% 1% 1%

Daily Weekly Monthly Quarterly Yearly Only when new data Only when accuracy Only when objectives Don’t
is available falls below a certain change
threshold

© 2023 TechTarget, Inc. All Rights Reserved. 231ÆC(


Data Science and
Machine Learning
Become a Team
Sport, With Vendors
Focused on Enabling
All Stakeholders
The State of Data Science and Machine Learning 18

Sources used to ensure collaboration between stakeholders and other team


| Sources used to ensure collaboration between stakeholders and other team members on data science initiatives.
members on data science initiatives.

Building Bridges for Data visualization tools 46%


Collaborative Data Science
Virtual workspaces 45%
Success
Data science community forums and marketplaces 44%
Collaboration among stakeholders
and team members is vital for Data science/machine learning platforms 43%
successful data science initiatives.
Organizations employ tools and General purpose help groups/forums 32%
methods to integrate expertise,
fostering constructive dialogue, Code repositories/version control systems 30%
strategy refinement, and collective
guidance. This open communication Open source community forums 27%
empowers diverse roles to shape
outcomes, enhancing analysis quality Agile methodologies 23%
and propelling organizations toward
transformative insights and decisions. Shared notebooks 22%

Pair programming or peer code review 20%

Hackathons 9%

© 2023 TechTarget, Inc. All Rights Reserved. 231ÆC(


The State of Data Science and Machine Learning 19
| Machine learning model building areas that involve non-data science professionals (e.g., business analysts).

Mapping Stakeholder
Involvement Across the
Data Science Lifecycle 44% 40% 39%
Data collection/ Data preprocessing Model deployment
supply

Non-data science stakeholders play a


significant role across the data science
lifecycle, influencing various stages
from data collection and preprocessing
to model deployment and model
management. This is a big reason why
92% of respondents rated the experience 38% 36% 36%
of business stakeholders involved in Model monitoring/ Model training Model evaluation
data science initiatives and working with maintenance
data science teams as positive, if not
very positive. Creating data science and
machine learning solutions that cater to
the non-data science community poses
significant opportunities for vendors
as organizations move forward in data
science regardless of their levels of data
science expertise. 30% 28% 27%
Model selection Logic building Use case/problem
definition

© 2023 TechTarget, Inc. All Rights Reserved. 231ÆC(


The State of Data Science and Machine Learning 20

99% of responents are motivated to improve


their data science and machine learning skills.
| Employees’ drivers to improve skills in data science and machine learning.

Unlocking Employee Potential


Career advancement opportunities 520+480= 52%

Keeping pace with industry trends 500+500= 50% With 99% of people motivated to improve
their data science and machine learning skills,
the research highlights that improvements
Job security 450+550= 45% are fueled by a combination of intrinsic and
extrinsic motivations. The prospects of
Employer requirements 440+560= 44% career advancement, recognition, and salary
increases, along with the promise of contributing
meaningfully to cutting-edge projects, act as
Generalinterestinthe…eld
400+600= 40% powerful external motivators. This combination
of tangible rewards with intellectual curiosity
Salary increase 400+600= 40% creates an interesting dynamic within the work
environment where employees are inspired to
invest time (sometimes outside of work) to
Personalful…llment 350+650= 35% continue honing their skills.

© 2023 TechTarget, Inc. All Rights Reserved. 231ÆC(


The State of Data Science and Machine Learning 21

KNIME helps everybody make sense of data.

Its free and open-source KNIME Analytics Platform enables anyone--whether they come from a
business, technical or data background–to intuitively work with data, every day. KNIME Business Hub
is the commercial complement to KNIME Analytics Platform and enables users to collaborate on data
science and share insights across the organization. Together, the products support the complete data
science lifecycle, allowing teams at all levels of analytics readiness to support the operationalization of
data and to build a scalable data science practice.

Learn More

© 2023 TechTarget, Inc. All Rights Reserved. 231ÆC(


The State of Data Science and Machine Learning 22

Research Methodology and Demographics


To gather data for this report, ESG conducted a comprehensive online survey of data professionals from private- and public-sector organizations in North America (United States
and Canada) between June 5, 2023 and June 27, 2023. To qualify for this survey, respondents were required to be involved with data science and machine learning technologies
and processes, including potential responsibility for strategizing, evaluating, purchasing, building, and managing these technologies. All respondents were provided an incentive
to complete the survey in the form of cash awards and/or cash equivalents.

After filtering out unqualified respondents, removing duplicate responses, and screening the remaining completed responses (on a number of criteria) for data integrity, we were
left with a final total sample of 366 data professionals.

(0(2N2(rE(( (0(23(Ma (0(2ª0r

20,000 or Fewer than 5


10,000 to more, 6% 100 to 499, Don’t know, 1% years, 1% Manufacturing 37%
19,999, 7% 20% More than 50
years, 10% 5 to 10 years, Financial services 13%
17%
Technology 10%
5,000 to
9,999, 14% Healthcare 7%
21 to 50 years,
25% Retail/wholesale 7%

Communications and media 6%

500 to 999,
Business services 5%
22%
2,500 to
4,999, 14% Government 1%

11 to 20 years,
Other 14%
1,000 to 48%
2,499, 17%

© 2023 TechTarget, Inc. All Rights Reserved. 231ÆC(


All product names, logos, brands, and trademarks are the property of their respective owners. Information contained in this publication has been obtained by sources eT chTarget, Inc. considers to be reliable but is not warranted by eT chTarget, Inc.
This publication may contain opinions of eT chTarget, Inc., which are subject to change. This publication may include forecasts, projections, and other predictive statements that represent eT chTarget, Inc.’s assumptions and expectations in light of
currentlyavailableinJormation.TheseJorecastsarebasedonindustrytrendsandinvolvevariablesanduncertainties.’onseUuently,TechTarget,Inc.makesno[arrantyastotheaccuracyoJspeci…cJorecas
contained herein.

This publication is copyrighted by eT chTarget, Inc. Any reproduction or redistribution of this publication, in whole or in part, whether in hard-copy format, electronically, or otherwise to persons not authorized to receive it, without the express
consent of eT chTarget, Inc., is in violation of U.S. copyright law and will be subject to an action for civil damages and, if applicable, criminal prosecution. Should you have any questions, please contact Client Relations at [email protected].

Enterprise Strategy Group isanintegratedtechnologyanalysis,research,andstrategy…rmprovidingmarketintelligence


insight,andgo-to-marketcontentservicestotheglobaltechnologycommunity.
© 2023TechTarget,Inc.AllRightsReserved.

You might also like