State of Data Science and Machine Learning
State of Data Science and Machine Learning
Research Objectives
Several challenges are preventing organizations from successfully integrating machine learning (ML) models into their
software development lifecycle. Bridging the gap between different skill sets, handling complex and large data sets, managing
specialized hardware, and ensuring availability, scalability, and security in production collectively delay time to value and cause
organizational bottlenecks.
Due to the increasing interest in and complexity of machine learning projects, organizations need improved agility, efficiency,
and performance, with risk reduction through right-sized governance. Organizations recognize that they need clear data science
and machine learning strategies. As part of these strategies, MLOps can provide a structured and standardized approach to
developing, deploying, and maintaining ML models in production to see greater value. To gain further insight into these trends,
TechTarget’s Enterprise Strategy Group (ESG) surveyed 366 professionals at organizations in North America (US and Canada)
involved with data science and machine learning technologies and processes, including potential responsibility for strategizing,
evaluating, purchasing, building, and managing these technologies.
Identify investment plans, objectives, and Determine how organizations are prioritizing
challenges of data science and machine solutions to best help them succeed.
learning initiatives and projects.
( 2(˛ 2(
findings
0Æ0ÆÆ
2(˛ 2(¸
© 2023 TechTarget, Inc. All Rights Reserved.
© 2023 TechTarget, Inc. All Rights Reserved.
Investments
Point to
Staggering
Growth, But
Challenges
Loom Large
The State of Data Science and Machine Learning 5
43+49+71J
for data science and machine learning
projects/initiatives. These budgets
| Change in budget for data science and machine learning projects/initiatives compared with previous year. are significant, with nearly one in four
organizations (24%) planning to invest
Stayed the same, at least $1 million in people, process,
7%
or technology in association with data
science and machine learning over the
next several years. This heightened
investment reflects an understanding
that data science not only enhances
Increased operational efficiency but also enables
Increased
somewhat, informed decision making, predictive
significantly,
analytics, and innovative product
13% 43% development. This financial support
emphasizes the pivotal role that data
science and machine learning play in
enabling the business to extract valuable
knowledge from vast and complex data
sets, propelling organizations toward
success in the digital age.
23+77+S 23+77+S
23% 23%
Business impact Technical complexity
(i.e., projects with highest (i.e., projects with highest
potential business impact) technical complexity)
7+93+S 7%
Time to market
(i.e., projects with shortest time
to market) 13+87+S 13%
Resource availability
(i.e., projects that can be completed
with available resources)
88%
of organizations agree
14+86+S 19+81+S
that open source is
14% 19% critical to innovation
Customer feedback Executive leadership
(i.e., projects that address (i.e., priorities are dictated by the in data science and
executive leadership team)
machine learning.
customer feedback)
Percentage
| Percentage of machine learning models deployed of machine
into production learning models deployed into production environments.
environments.
33%
26%
17%
15%
2% 3%
1% 4%
Less than 5% 5% to 10% 11% to 25% 26% to 50% 51% to 75% More than 75% We have not yet Don’t know
deployed an ML
model into production
© 2023 TechTarget, Inc. All Rights Reserved. 231ÆC(
The State of Data Science and Machine Learning 13
| Data science lifecycle steps performed on a regular basis. Most challenging data science lifecycle steps.
Data science lifecycle steps performed on a regular basis. Most challenging data science lifecycle steps.
“
Striking a Balance Between
A well-defined strategy to model monitoring Retraining and Maintaining
and maintenance that factors in benefits, costs,
With 47% of organizations retraining models on at least a
and impact is essential to making to the right weekly basis, it is important to understand the impact frequent
decisions about the optimal retraining schedule.” retraining can have on an organization, from resource strain and
inefficiency to amplifying data noise and creating versioning
complexities. While making changes via retraining based on data
drift is important, doing so excessively can disrupt operations,
| Frequency of retraining machine learning models in production. confuse users, and hinder strategic focus on critical deployment
Frequency of retraining machine learningaspects
models in likeproduction.
monitoring and ethics. Organizations must balance
retraining frequency and the potential downsides associated with
it. A well-defined strategy to model monitoring and maintenance
36% that factors in benefits, costs, and impact is essential to making
to the right decisions about the optimal retraining schedule.
23% 22%
11%
2% 3
1% 1% 1%
Daily Weekly Monthly Quarterly Yearly Only when new data Only when accuracy Only when objectives Don’t
is available falls below a certain change
threshold
Hackathons 9%
Mapping Stakeholder
Involvement Across the
Data Science Lifecycle 44% 40% 39%
Data collection/ Data preprocessing Model deployment
supply
Keeping pace with industry trends 500+500= 50% With 99% of people motivated to improve
their data science and machine learning skills,
the research highlights that improvements
Job security 450+550= 45% are fueled by a combination of intrinsic and
extrinsic motivations. The prospects of
Employer requirements 440+560= 44% career advancement, recognition, and salary
increases, along with the promise of contributing
meaningfully to cutting-edge projects, act as
Generalinterestinthe…eld
400+600= 40% powerful external motivators. This combination
of tangible rewards with intellectual curiosity
Salary increase 400+600= 40% creates an interesting dynamic within the work
environment where employees are inspired to
invest time (sometimes outside of work) to
Personalful…llment 350+650= 35% continue honing their skills.
Its free and open-source KNIME Analytics Platform enables anyone--whether they come from a
business, technical or data background–to intuitively work with data, every day. KNIME Business Hub
is the commercial complement to KNIME Analytics Platform and enables users to collaborate on data
science and share insights across the organization. Together, the products support the complete data
science lifecycle, allowing teams at all levels of analytics readiness to support the operationalization of
data and to build a scalable data science practice.
Learn More
After filtering out unqualified respondents, removing duplicate responses, and screening the remaining completed responses (on a number of criteria) for data integrity, we were
left with a final total sample of 366 data professionals.
500 to 999,
Business services 5%
22%
2,500 to
4,999, 14% Government 1%
11 to 20 years,
Other 14%
1,000 to 48%
2,499, 17%
This publication is copyrighted by eT chTarget, Inc. Any reproduction or redistribution of this publication, in whole or in part, whether in hard-copy format, electronically, or otherwise to persons not authorized to receive it, without the express
consent of eT chTarget, Inc., is in violation of U.S. copyright law and will be subject to an action for civil damages and, if applicable, criminal prosecution. Should you have any questions, please contact Client Relations at [email protected].