• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Articles
  • News
  • Events
  • Advertize
  • Jobs
  • Courses
  • Contact
  • (0)
  • LoginRegister
    • Facebook
    • LinkedIn
    • RSS
      Articles
      News
      Events
      Job Posts
    • Twitter
Datafloq

Datafloq

Data and Technology Insights

  • Categories
    • Big Data
    • Blockchain
    • Cloud
    • Internet Of Things
    • Metaverse
    • Robotics
    • Cybersecurity
    • Startups
    • Strategy
    • Technical
  • Big Data
  • Blockchain
  • Cloud
  • Metaverse
  • Internet Of Things
  • Robotics
  • Cybersecurity
  • Startups
  • Strategy
  • Technical

How to Structure a Data Science Team: Key Models and Roles to Consider

Oleksandr Kolisnykov / 9 min read.
January 23, 2018
Datafloq AI Score
×

Datafloq AI Score: 73

Datafloq enables anyone to contribute articles, but we value high-quality content. This means that we do not accept SEO link building content, spammy articles, clickbait, articles written by bots and especially not misinformation. Therefore, we have developed an AI, built using multiple built open-source and proprietary tools to instantly define whether an article is written by a human or a bot and determine the level of bias, objectivity, whether it is fact-based or not, sentiment and overall quality.

Articles published on Datafloq need to have a minimum AI score of 60% and we provide this graph to give more detailed information on how we rate this article. Please note that this is a work in progress and if you have any suggestions, feel free to contact us.

floq.to/nt8Sc

If you’ve been following the direction of expert opinion in data science and predictive analytics, you’ve likely come across the resolute recommendation to embark on machine learning. As James Hodson in Harvard Business Review recommends, the smartest move is to reach for the low hanging fruit and then scale for expertise in heavier operations.

Just recently we talked about machine-learning-as-a-service (MLaaS) platforms. The main takeaway from the current trends is simple. Machine learning becomes more approachable for midsize and small businesses as it gradually turns into a commodity. The leading vendors Google, Amazon, and Microsoft provide APIs and platforms to run basic ML operations without a private infrastructure and deep data science expertise. In the early stages, taking this lean and frugal approach would be the smartest move. As analytics capabilities scale, a team structure can be reshaped to boost operational speed and extend an analytics arsenal.

How to implement this incremental approach? This time we talk about data science team structures and their complexity.

Data science team structures

Embarking on data science and predictive analytics requires a clear understanding of how the initiative is going to be introduced, maintained, and further scaled regarding the team structure. We recommend considering three basic team structures that match different stages of machine learning adoption.

IT-centric structure

Sometimes, hiring data scientists is not an option, and you must leverage talent that’s already in-house. The main analytics and leadership role would be a business translator, usually referred to as a chief analytics officer (CAO) or chief data officer (CDO). The latter term gradually becomes redundant as most data processes are reshaped towards predictive analytics. This person should be capable of leading the initiative. We’ll take a more detailed look at the position below.

All the rest data preparation, training models, creating user interfaces, and model deployment within a corporate IT infrastructure ‘ can be largely managed by the IT department (if your organization actually has a fully functioning, in-house IT department). This approach is fairly limited, but it can be realized by using MLaaS solutions. Environments like Azure Machine Learning or Amazon Machine Learning are already equipped with approachable user interfaces to clean datasets, train models, evaluate them, and deploy.

Azure Machine Learning, for instance, supports its users with detailed documentation for a low entry threshold. This allows for fast training and early deployment of models even without an expert data scientist on board.

On the other hand, MLaaS solutions present their limitations in terms of machine learning methods and cost. All operations, from data cleaning to model evaluation, have their separate prices. And considering that the number of iterations to train an effective model can’t be estimated in advance, working with MLaaS platforms entails some budget uncertainty.

Pros of IT-centric structure:

  • Leverage new investments with existing IT resources
  • Computing infrastructure is provided and maintained by an external service
  • In-house specialists can be trained to realize predictive analytics potential further
  • Cross-silo management is reduced as all operations are held within the IT department
  • Less time-to-market for relatively simple machine learning tasks requiring one or a few models

Cons of IT-centric structure:

  • Limited machine learning methods and data cleaning procedures that these services provide
  • Model training, testing, and prediction should be paid for. This entails uncertainty of eventual cost per prediction as the number of needed iterations can’t be estimated in advance

Integrated structure

With the integrated structure, a data science team focuses on dataset preparation and model training, while IT specialists take charge of the interfaces and infrastructure supporting deployed models. Combining machine learning expertise with IT resource is the most viable option for constant and scalable machine learning operations.

Unlike the IT-centric approach, the integrated method requires having an experienced data scientist on a team and an elaborate recruitment effort beforehand. This ensures better operational flexibility in terms of available techniques. Besides end-to-end and yet limited services, you can leverage deeper machine learning tools and libraries like Tensor Flow or Theano that are designed for researchers and experts with data science backgrounds. With this effort allocation, you can address highly specific business problems and choose between as-a-service and custom-built ML solutions.

Pros of integrated structure:

  • Leveraging existing IT resources and investments
  • Data scientists focus on innovation
  • Utilizing full potential of both as-a-service and custom ML applications
  • Start with one or two data scientists, then train and onboard more homegrown experts
  • Using custom model combinations (ensemble models) that yield better or broader predictions

Cons of integrated structure:

  • Computing infrastructure is required in case of custom ML use
  • Cross-silo management takes considerable effort
  • Significant investments in data science talent acquisition
  • Data science talent engagement and retention challenges

Specialized data science department

To reduce management effort and build an all-encompassing machine learning framework, you can run the entire machine learning workflow within an independent data science department. This approach entails the highest cost. All operations, from data cleaning and model training to building front-end interfaces, are realized by a dedicated data science team. It doesn’t necessarily mean that all team members should have a data science background, but they should acquire technology infrastructure and service management skills.

A specialized structure model aids in addressing complex data science tasks that include research, use of multiple ML models tailored to various aspects of decision-making, or multiple ML-backed services. In the case of large organizations, specialized data science teams can supplement different business units and operate within their specific fields of analytical interest.

Pros of specialized data science department:

  • Centralized data science management and increased problem-solving capacities
  • Realizing the full potential of both as-a-service and custom ML applications
  • Solving complex prediction problems that require deep research or building segmented model factories (that operate automatically across different segments and business units)
  • Setting a fully-featured data science playground to foster innovation
  • Greater scalability potential

Cons of specialized data science department:

  • Building and maintaining a complex computational infrastructure
  • Heavy investments in data science talent acquisition
  • Data science talent engagement and retention challenges

Data-science-team-structures

Regardless of what structure you opt for to start, having the right data science and analytics talent is critical. Who are the people you should look for?

Data science team roles

Let’s talk about skillsets of data scientists. Unfortunately, the term data scientist expanded and became too vague in recent years. After data science appeared in the business spotlight, there is no consensus developed regarding what the skillset of a data scientist is. Matthew Mayo, Data Scientist and the Deputy Editor of KDNuggets, argues: When I hear the term data scientist, I tend to think of the unicorn, and all that it entails, and then remember that they don’t exist, and that actual data scientists play many diverse roles in organizations, with varying levels of business, technical, interpersonal, communication, and domain skills.

This is true. It’s hard to find unicorns, but it’s possible to grow them from people with niche expertise in data science. We at AltexSoft consider these skills when hiring machine learning specialists:

Data-scientist-skillset


Interested in what the future will bring? Download our 2023 Technology Trends eBook for free.

Consent

Keep in mind that even professionals with this hypothetical skillset usually have their core strengths, which should be considered when distributing roles within a team. In most cases, acquiring talents will entail further training depending on their background.

But people and their roles are two different things. For instance, if your team model is the integrated one, an individual may combine multiple roles. So, let’s disregard how many actual experts you may have an outline the roles themselves. Obviously, many skillsets across roles may intersect.

Chief Analytics Officer/Chief Data Officer

In our recent whitepaper on machine learning, we broadly discussed this key leadership role. CAO, a business translator, bridges the gap between data science and domain expertise acting both as a visionary and a technical lead. You may get a better idea by looking the visualization below.

Preferred skills: data science and analytics, programming skills, domain expertise, leadership and visionary abilities

Chief Analytics Officer

Data analyst

The role implies proper data collection and interpretation activities. An analyst ensures that collected data is relevant and exhaustive while also interpreting the analytics results. Some companies, like IBM or HP, also require data analysts to have visualization skills to convert alienating numbers into tangible insights through graphics.

Preferred skills: R, Python, JavaScript, C/C++, SQL

Business analyst

A business analyst basically realizes a CAO’s functions but on the operational level. This implies converting business expectations into data analysis. If your core data scientist lacks domain expertise, a business analyst bridges this gulf.

Preferred skills: data visualization, business intelligence, SQL

Data scientist (not a data science unicorn)

Assuming you aren’t hunting unicorns, a data scientist is a person who solves business tasks using machine learning and data mining techniques. If this is too fuzzy, the role can be narrowed down to data preparation and clean with further model training and evaluation.

Preferred skills: R, SAS, Python, Matlab, SQL, NoSQL, Hive, Pig, Hadoop, Spark

Data Architect

This role is critical for working with large amounts of data (you guessed it, Big Data). However, if you don’t solely rely on MLaaS cloud platforms, this role is critical to warehouse the data, define database architecture, centralize data, and ensure integrity across different sources. For large distributed systems and big datasets, the architect is also in charge of performance.

Preferred skills: SQL, NoSQL, XML, Hive, Pig, Hadoop, Spark

Data engineer

Engineers implement, test, and maintain infrastructural components that data architects design. Realistically, the role of an engineer and the role of an architect can be combined in one person. The set of skills is very close.

Preferred skills: SQL, NoSQL, Hive, Pig, Matlab, SAS, Python, Java, Ruby, C++, Perl

Application/data visualization engineer

Basically, this role is only necessary for a specialized data science model. In other cases, software engineers come from IT units to deliver data science results in applications that end-users face. And it’s very likely that an application engineer or other developers from front-end units will oversee end-user data visualization.

Preferred skills: programming, JavaScript (for visualization), SQL, NoSQL

Team assembly and scaling

The initial challenge of talent acquisition in data science, besides the overall scarcity of experts, is the high salary expectations. According to O’Reilly Data Science Salary Survey 2016, the median annual base salary is $87,000, while in the USA the figure reaches $105,000. These numbers significantly vary depending on geography, specific technical skills, organization sizes, gender, industry, and education. If you decide to hire skilled analytics experts, the further challenges also include engagement and retention.

The intellectual curiosity in combination with the high demand challenges organizations to engage data scientists with creative and explorative projects. Due to these reasons, the IT-centric team structure which leverages existing sources is a promising alternative on the initial levels of machine learning adoption. Thus, engineers can acquire some analytics skills through ML-as-a-service solutions with friendly interfaces.

Another way to address the talent scarcity and budget limitations is to develop approachable machine learning platforms that would welcome new people from IT and enable further scaling. Even if no experienced data scientists can be hired, some organizations bypass this barrier by building relationships with educational institutions. In the US, there are about a dozen PhD programs emphasizing data science and numerous boot camps with 12-month-or-so courses.

Critical thing to be aware of

If you ask AltexSoft’s data science experts what the current state of AI/ML across industries is, they will likely point out two main issues: 1. Business executives still need to be convinced that reasonable ROI of ML investments exists. 2. If they are convinced and understand the value proposition and market demand, they may lack technical skills and resources to make products a reality.

These barriers are mostly due to digital culture in organizations. Efficient data processes challenge C-level executives to embrace horizontal decision-making. Frontline managers with access to analytics have more operational freedom to make data-driven decisions, while top-level management oversees a strategy. This reduces management effort and eventually mitigates gut-feeling-decision risks. Basically, the cultural shift defines the end success of building a data-driven business. As McKinsey argues, setting a culture is probably the hardest part, while the rest is manageable.

Categories: Artificial Intelligence, Big Data, Strategy
Tags: big data engineer, big data scientist, data architect, data science, data scientist, machine learning

About Oleksandr Kolisnykov

Oleksandr is a tech journalist and content strategist at AltexSoft who focuses on covering machine learning applications in business.

Primary Sidebar

E-mail Newsletter

Sign up to receive email updates daily and to hear what's going on with us!

Publish
AN Article
Submit
a press release
List
AN Event
Create
A Job Post

Related Articles

The Advantages of IT Staff Augmentation Over Traditional Hiring

May 4, 2023 By Mukesh Ram

The State of Digital Asset Management in 2023

May 3, 2023 By pimcoremkt

Test Data Management – Implementation Challenges and Tools Available

May 1, 2023 By yash.mehta262

Related Jobs

  • Software Engineer | South Yorkshire, GB - February 07, 2023
  • Software Engineer with C# .net Investment House | London, GB - February 07, 2023
  • Senior Java Developer | London, GB - February 07, 2023
  • Software Engineer – Growing Digital Media Company | London, GB - February 07, 2023
  • LBG Returners – Senior Data Analyst | Chester Moor, GB - February 07, 2023
More Jobs

Tags

AI Amazon analysis analytics app application Artificial Intelligence BI Big Data blockchain business China Cloud Companies company costs crypto customers Data development digital engineer environment experience future Google+ government information learning machine learning market mobile Musk news public research security share skills social social media software strategy technology twitter

Related Events

  • 6th Middle East Banking AI & Analytics Summit 2023 | Riyadh, Saudi Arabia - May 10, 2023
  • Data Science Salon NYC: AI & Machine Learning in Finance & Technology | The Theater Center - December 7, 2022
  • Big Data LDN 2023 | Olympia London - September 20, 2023
More events

Related Online Courses

  • Oracle Cloud Data Management Foundations Workshop
  • Data Science at Scale
  • Statistics with Python
More courses

Footer


Datafloq is the one-stop source for big data, blockchain and artificial intelligence. We offer information, insights and opportunities to drive innovation with emerging technologies.

  • Facebook
  • LinkedIn
  • RSS
  • Twitter

Recent

  • 5 Reasons Why Modern Data Integration Gives You a Competitive Advantage
  • 5 Most Common Database Structures for Small Businesses
  • 6 Ways to Reduce IT Costs Through Observability
  • How is Big Data Analytics Used in Business? These 5 Use Cases Share Valuable Insights
  • How Realistic Are Self-Driving Cars?

Search

Tags

AI Amazon analysis analytics app application Artificial Intelligence BI Big Data blockchain business China Cloud Companies company costs crypto customers Data development digital engineer environment experience future Google+ government information learning machine learning market mobile Musk news public research security share skills social social media software strategy technology twitter

Copyright © 2023 Datafloq
HTML Sitemap| Privacy| Terms| Cookies

  • Facebook
  • Twitter
  • LinkedIn
  • WhatsApp

In order to optimize the website and to continuously improve Datafloq, we use cookies. For more information click here.

settings

Dear visitor,
Thank you for visiting Datafloq. If you find our content interesting, please subscribe to our weekly newsletter:

Did you know that you can publish job posts for free on Datafloq? You can start immediately and find the best candidates for free! Click here to get started.

Not Now Subscribe

Thanks for visiting Datafloq
If you enjoyed our content on emerging technologies, why not subscribe to our weekly newsletter to receive the latest news straight into your mailbox?

Subscribe

No thanks

Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

Marketing cookies

This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.

Keeping this cookie enabled helps us to improve our website.

Please enable Strictly Necessary Cookies first so that we can save your preferences!