Executives tout digital transformation programs as being able to deliver process improvements and productivity increases with a meaningful impact on the bottom line. However, the most important corollary is the generation of an exponential amount of data about employees, systems, transactions, and customers. The accumulation of those data assets spells countless opportunities to unlock new value, with the only question being how. Enter data scientists.
What is Data Science and how is it connected to Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL)?
Data science is now an umbrella term that encompasses a wide range of reporting, forecasting, and analytic activities which usually leverage a vast amount of (mostly internal) information. The latter has come to be known as “big data” and is often characterized with the 4V definition – possessing large volume, significant variety, high speed of change of velocity, and some uncertainty – veracity. A precise definition of data science has remained elusive but seasoned professionals often joke that if it cannot be done in Excel, it is probably data science.
Initially, the data scientist looked a lot like the webmaster of the nineties – a versatile professional with a huge understanding of anything related to the field – from data collection, database management, processing, mathematical and statistical modeling, through reporting and visual storytelling. As the field matures, those roles become more separate and specialized. Generally, data engineers take care of the storage, databases, extraction, and possibly processing, while data scientists focus on modeling, calibration, inference, and visualization. The new hero of real-time analytics – the MLOps engineer is responsible for putting the solution into production. If you are in the tech sector or in an organization of at least medium-size, chances are that you have some of those roles in your company, too.
Some of those teams provide stellar results, but most of them struggle to make a meaningful impact. In one of many examples, Gartner predicts that through 2022, only 20% of analytic insights will drive business outcomes. In fact, in an early research paper, Becker shows that the number one reason for data science project failure is lack of adequate skills, closely followed by setting incorrect business objectives. The third place goes to insufficient ROI or lack of a proper business case. In short, 62% of failures are due to organizational issues.
This may be partly due to the marked focus of data science teams on generating sophisticated models that either do not address meaningful business needs or present over-engineered solutions that overwhelm both the team itself, the DevOps function, and even the management who strives to show value. On one hand, this has led to the rapid explosion of new machine learning algorithms, with neural networks being a workhorse for industrial applications. As they grow in sophistication, they provide opportunities for the so-called deep learning which excels at complex tasks such as image or video recognition, and even generation! This spurred many a pundit – and a few sales guys, too – to coin the term artificial intelligence (AI) to describe such advanced machine learning solutions.
While the current generation of AI is far from the capabilities of the Terminator and experts do not expect it to get there over the next 40 years or so, the current generation still provides vast opportunities for businesses to monetize value. In an oft-cited study, Nucleus Research estimates that a successful analytic project may be able to deliver up to USD 13 per dollar spent, or a 13x ROI. As data science and AI maturity increase, it is highly likely that those multiples will go down, but analytics will still remain a robust investment.
Benefits of data science: how exactly does it transform businesses?
At first glance, it may seem that a data science initiative is a lot like a game of reverse Russian roulette – with high chances of failure (up to 85% according to KD Nuggets) and outrageous returns for success. Still, the failures are perfectly predictable and are due to organizational, management, and governance issues in 2 out of 3 organizations. The key to the success of your project or program is understanding how benefits are transmitted all the way from pushing data from a transaction system into the company’s EBITDA. At Prime, we like to take a structured approach by outlining the key benefits behind an analytic initiative and then evaluating them to select the most probable candidates for success. The rest is a relentless focus on implementation.
We see a few key benefits that data science can bring (and a few it cannot). Those value drivers fall (not so) neatly in a few groups:
- Supercharged marketing and sales – the marketing and sales teams are a natural starting place for an analytic breakthrough. While many teams already use the in-built reporting functionalities of popular CRM and digital marketing solutions, only very few go beyond reporting and into modeling. Still, using the plethora of well-structured, high-quality granular data on prospects and customers, data scientists may train models that show which leads are most likely to convert, A/B-test marketing campaigns and content, and see what drives sales and revenues per channel. All this realigns S&M teams and allows management to funnel resources to the highest-ROI activities, thus increasing lift and revenue.
- Increased customer satisfaction and retention – many organizations sit on immense quantities of insightful data on their customer that is generated as users make their way through the digital products and offerings. Data scientists are able to plough through this information and outline what the most useful features are, what the client needs are, and importantly – what the drivers behind NPS scores, repurchases and churn decisions are. Knowing what customers want and why they quit enables the product and customer success teams to act rapidly based on sound data and improve the product relentlessly while constantly measuring the impact.
- Internal process improvements – back in the days when a process optimization was desired, consultants would come, ask questions and try to imagine how things could be improved – e.g. through relocation of activity or control, through automation, or decrease of decision forks. Nowadays, most (if not all) processes in modern organizations take place or leave a trace within some IT system, often in the form of logs. Those can be collected by the data science team and through the techniques for event mining the actual process steps and activities are constructed as they are, and not as we imagine they ought to be. Automated recognition of process bottlenecks gives robust ideas for improvements, supplementing hunch with evidence.
- Better operational and tactical decisions – as organizations embrace a data-driven decision-making process, they will need to make sure that all stakeholders are presented with easy-to-access, easy-to-understand data that is relevant to their needs. While one may be stymied by tens of different dashboards and hundreds of prediction-generating services, we should keep in mind that employees and managers only look at what matters to them to make a difference in their job. Thus, the proliferation of automated, real-time, relevant reporting empowers everyone to make better decisions, eking out productivity increases as they go.
- Improved strategic process – as more and more data accumulate, organizations get swamped in a large number of possibly conflicting analytics. So much data starts to obscure the strategic vision, instead of supporting it. This is why data science teams often work on the quality of the data itself by spearheading data governance and master data management (MDM) initiatives. Oftentimes a single source of truth for data is established for reporting and strategic purposes – with either a data warehouse or a data lake underneath to power it, and extensive high-level management analytics and visualizations to align executive teams and enable fast-paced data-driven decisions in the fact of constantly changing market realities.
- Better controlling – the introduction of data governance greatly enhances controlling and pushes it to the next level from purely reporting backward-looking financial and operational metrics at period end to a continuous learning process based on the ingestion of real-time data. Moreover, data science teams are able to identify leading indicators for business success that make a forward-looking perspective on the organization a very real possibility. Finally, trend identification and forecasting can be used for counterfactual analyses that help separate individual contributions from context change for better performance measurement.
- Accelerated innovation – the Holy Grail of a data science initiative is to support a smart organization to deliver continuous innovation based on customer, market, and operational data. While analytics is an excellent tool for delivering sustainable innovation through process improvement, it can be harnessed by strategic and product teams to discern the next big opportunities and usher a structured innovation process that delivers on them. In fact, a 2021 study of 253 top managers by Ciampi et al. found that big data analytic capabilities are able to explain more than 40% of the variance in business model innovations. Add to that an entrepreneurial mindset and together these explain more than 50%.
At any rate, the keyword in analytic projects discussions is relevance: better data insights align organizations and empower employees and management to make rapid decisions, solving relevant problems in a timely manner. The convergence of advanced technology and a receptive organizational culture is essentially what drives exorbitant ROIs in successful projects.
The business impact of adopting a data science strategy
It is little surprise that efficient introduction or expansion of the data science function needs a proper strategy. As early as 2017, a Harvard Business Review article by Davenport advised executives that they need not one but two distinct data strategies – a distinction that was destined to live on. A defensive data strategy is all about, well, defending your data. It recognizes that under the pressures of heightened cybersecurity threats and an ever-increasing regulatory oversight, organizations must achieve a solid data security posture and be able to respond to compliance requirements.
To this end, they leverage a defensive strategy that is heavily focused on data governance initiatives that catalog and classify data according to policy, introduce access controls, and expand the scope of management. This is valuable as data begins to be treated as the valuable asset it is – all the way from creating an inventory, improving meta-data, putting a price tag and an exposure level, and designating data champions to work with business functions in order to create value while remaining compliant. There is still a long way to go to reach this state – NewVenture Partners’ 2022 study indicates that only 39.7% of surveyed large organizations manage data as an enterprise asset. At any rate, a heavily regulated industry such as financial or medical services will find many reasons to introduce many elements of the defensive strategy.
On the other end of the spectrum, one finds the offensive data strategy which focuses on flexibility and is fixated on unlocking value and driving innovation through data. This is usually much heavier on analytics, modeling, and visualization with data that is “good enough”. While data governance is also beneficial here, it is considered as the infrastructure upon which the real action takes place. In this case, the data modeler and the business people take the leading roles, while the data engineer and the compliance manager merely support them. It is usually through an offensive data strategy that the business benefits of an analytic project are fully realized which is why less regulated industries such as retail focus heavily on this approach. In reality, an organization needs both of them but depending on its competitive environment, level of regulation, consumer expectations, and internal capabilities and culture it will lean more towards one. Nowadays, 47% of organizations are competing on data and analytics, according to the NPV surveys, and this is only likely to increase, putting even more pressure to go on the offensive.
Managing data science processes in the organization
Once a combination of defensive and offensive strategic elements are firmly established, the real data science action begins: creating a team and embedding it within the organization, pinpointing technologies, and starting to tackle pressing business needs. This usually takes strong and visible leadership. From its humble beginnings as a reporting team within the finance or sales/marketing arm of the organization, analytics is now an established function within the organization with its own senior management. Those teams often report directly to a C-level officer of the company, with NPV reporting that in 2022 about 74% of the surveyed large companies have Chief Data and Analytics Officers (CDAO), a spectacular rise from 12% just a decade ago. The CDAO may report to the Board, the CEO or another C-level officer such as the CTO but in any case, has a voice that resonates and sufficient resources to drive a large-scale organizational transformation.
Staffing the team with the right talent is the next challenge and many organizations find it beneficial to batch hire data scientists, thus forming the core of the team. At Prime, we have found it particularly useful to combine senior and junior data scientists that grow together. There is a catch though – while in classic software engineering we tend to have a senior developer supervising a few more junior ones, in the data science practice we have a team composed overwhelmingly of more senior people supervising much fewer juniors. We are, in effect, reversing the pyramid. This team composition is predicated on the fact that data science is a function between IT and business and the analytics professionals need to have a strong grasp of product and revenue implications as well as preferably some subject matter expertise. Given that, the demands to grow many juniors with very extensive coaching can overwhelm the seniors, thus making the whole endeavor much less effective. What is more, following the rigorous experimentation approach outlined in methodologies such as CRISP-DM requires a sufficient number of more seasoned professionals.
All teams benefit from a relative standardization of tooling. While the adage “whatever works” seems irresistibly tempting, some tools do tend to perform better than others for specific tasks. For structured data, the SQL remains king, while unstructured or extremely large datasets call for a NoSQL solution, often a Hadoop-based one. We like to use the R programming language for rapid prototyping due to its versatility and relative ease of use. However, when it comes to writing industrial-grade code that can perform in production, Python is the usual choice. Its robustness is also reflected in its popularity – the Anaconda State of Data Science 2021 report asked 4,299 respondents what language they used, and 63% of them used Python frequently or always. This trend will only accelerate – 88% of students reported they are being taught Python in preparation for an analytics career. Both Python and R have decent visualizations for reporting purposes, and commercially available dashboard solutions can take it to the next level – e.g. PowerBI or Tableau.
Many teams are now moving toward fully leveraging the cloud infrastructure for their data science needs, thus embracing the MLOps way of work. Commercially available clouds such as Azure, AWS, or Google Cloud have solutions that cover all parts of the analytic pipeline – from data storage through ETL (Extract, Transform, Load) processes all the way into advanced analytics. There is now the tooling to train a machine learning (ML) model by merely drawing the pipeline stages and setting a few parameters through a graphic user interface. The computational resources needed can also easily scale up. From a business perspective, this also makes sense – the company only pays for what the team uses, and it is easier to manage such OpEx expenses. While it is difficult to pinpoint the perfect cloud provider, data science teams should keep two things firmly in mind. First, it is always a good idea to stick to one and the same ecosystem within the company. Small benefits in additional features from another cloud are dwarfed by compatibility issues, learning curve effects, and general confusion. Second, intimate knowledge of the cloud provider can lead to savings in orders of magnitude by choosing a solution that is only slightly more constrained but a few times cheaper.
Use cases: how data science drives product innovation
Thinking about the data science journey may seem linear and discrete. Of course, it is not – the team, its composition, tooling, and tasks are always evolving. What glues everything together is a focus on creating value and driving innovation. Usually, pertinent data science problems are posed by business stakeholders. When we kick-off a data science project for a client, we at Prime always make sure to ask them if they had a magic wand, what would they change. Amazingly, most of those problems can be tackled with data! Each organization can define its most pressing needs and let analytics address them.
For example, as we worked for one of the largest payment operators, management identified ever-growing fraud as a constant concern. A new predictive model, providing real-time classification and action alerts let them decrease losses from fraud by 51%. Another customer – a large medical devices company – was concerned about increasing churn. Sophisticated market modeling and analysis through Bayesian networks showed exactly what sales reps need to focus on, thus driving revenue up by 18%. Finally, a banking client of ours had issues with their remote identification of customers. A combination of AI-enabled image recognition and automated data ingestion made sure they got it right in 96% of the cases, dramatically decreasing the need for human intervention and cutting costs.
Among large companies, 97% are investing in data initiatives in 2022…
…and among those – 91% are putting their money into AI as well (NPV). With the current state of the art, most of those projects are likely to fail, but the winners will win big. At Prime, we specialize in delivering success and will be happy to accompany you through your data science and analytics journey, all the way into the stratosphere.
Do you need a partner in navigating through times of change?
Book a consultation with our team of experts to start your data science journey efficiently, with the right team on your side.