• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Articles
  • News
  • Events
  • Advertize
  • Jobs
  • Courses
  • Contact
  • (0)
  • LoginRegister
    • Facebook
    • LinkedIn
    • RSS
      Articles
      News
      Events
      Job Posts
    • Twitter
Datafloq

Datafloq

Data and Technology Insights

  • Categories
    • Big Data
    • Blockchain
    • Cloud
    • Internet Of Things
    • Metaverse
    • Robotics
    • Cybersecurity
    • Startups
    • Strategy
    • Technical
  • Big Data
  • Blockchain
  • Cloud
  • Metaverse
  • Internet Of Things
  • Robotics
  • Cybersecurity
  • Startups
  • Strategy
  • Technical

Challenges Faced by ECommerce Businesses in Big Data Collection and Management

Chris Low / 4 min read.
August 2, 2017
Datafloq AI Score
×

Datafloq AI Score: 84.33

Datafloq enables anyone to contribute articles, but we value high-quality content. This means that we do not accept SEO link building content, spammy articles, clickbait, articles written by bots and especially not misinformation. Therefore, we have developed an AI, built using multiple built open-source and proprietary tools to instantly define whether an article is written by a human or a bot and determine the level of bias, objectivity, whether it is fact-based or not, sentiment and overall quality.

Articles published on Datafloq need to have a minimum AI score of 60% and we provide this graph to give more detailed information on how we rate this article. Please note that this is a work in progress and if you have any suggestions, feel free to contact us.

floq.to/djLzd

E-commerce is an important domain for big data mining due to its massive amount of click-stream and transactional data. Organizations collect this massive data, clean and transform it, and then analyze it to unveil business growing trends. In this article we will discuss big data mining related challenges in three sections: data collection, data cleansing, and data processing.

The following challenges apply to data collection in the e-commerce domain:

Sampling at collection

Large e-commerce websites generate between 5-10 million page views on an average every single day. Logging each of these user sessions in the backend can cause tremendous strain on the servers, not to mention the storage requirements necessary to handle this load. One way to counter this challenge is by sampling at the source. Sampling clickstream collection would effectively address the two issues mentioned above, although it introduces new problems. For one, sampled data will not be able to accurately capture rare events such as searching for a particular term or credit card authorization failure. Furthermore, business requirements, such as payment for advertising click-through referrals, require exact (rather than approximate) statistics.

Supporting changes in demographics

Customer demographics change; people get married, their children grow, their salaries change, etc. With these changes, the customer‘s needs, which are being modeled, also change. The challenge is to keep track of these changes and provide support for such changes in the analysis.

The following challenges apply to data cleansing in the e-commerce domain:

Detecting bots

Spiders and crawlers; collectively called bots, are automated programs that visit websites. Typical bots include web search engines (like Google), site monitoring software, and price and email harvesters. Due to the volume and type of traffic that they generate, bots can dramatically change click-stream patterns at a website. On the other hand, they can also be responsible for skewing any click-stream statistics.

For example, it is observed that the average page views per visit when bot visits are included are 1.5 to 2 times than the average page views per visit when bot visits are excluded. Large eCommerce websites typically see anywhere between 5% to 40% of their traffic from bots and other automated traffic sources. This makes it difficult to perform clickstream analysis since most of these bots do not actually identify themselves that way. In other cases, bots tend to pretend like they are real visitors. The only way to filter these bot visitors from skewing the clickstream analysis is by performing a set of heuristic and manual labeling on a continuous basis.

It is worth mentioning that page tagging methods of clickstream collection, which execute blocks of JavaScript at the client’s browser and log the statistics returned by the JavaScript at a server, avoid bots because they require the execution of JavaScript, which bots rarely execute. At the same time, if a human visitor has their JavaScript turned off in their browser or clicks on a link before the JavaScript code is completely executed could have their visits flagged erroneously as bot traffic. Such visits are normally around five percent of all human traffic and leads to an inaccurate clickstream measurement.


Interested in what the future will bring? Download our 2023 Technology Trends eBook for free.

Consent

Performing regular deduplication of customer accounts

Transactional systems do not usually offer checks to prevent the duplication of customer records. In some businesses, one customer may have multiple accounts and it may not always be possible to consolidate all activities performed by one customer into a single record.

Challenges also occur when the same kiosk or terminal is used by multiple people to log on to a website. If the website or platform does not have enough tracking parameters to uniquely identify each of these visitors separately, then this leads to a scenario where clickstream analysis of one user could actually include information from several users.

The following challenges apply to data processing in the e-commerce domain:

Supporting hierarchical attributes

Supporting hierarchical attributes is important in practice. A few algorithms have been designed to support hierarchical attributes directly but they do not scale to large hierarchies. The process of automating the process of utilizing hierarchies effectively still remains challenging.

Related products

One of the common strategies used by eCommerce websites to drive the average order value up is by promoting related products.’ Algorithms make use of big data analytics to map the purchasing behavior of every customer to identify patterns in purchase. By extrapolating the purchasing history of other customers who purchased a particular product, the algorithm could precisely predict the kind of products a buyer may find interesting.

Handling unknown and invalid attribute values

ECommerce websites routinely tag the products in their inventory with attributes such as color, size and weight to help with searches and product filtering. These attributes are also extremely useful for data mining purposes since these attributes may be used to find generalizations and patterns in user behavior based on these product attributes.

It is also possible to find some attributes that influence user behavior in certain classes of products, but not in others. For example, size makes sense for clothes and shoes, but not for books. For books, the size attribute would have a NULL value. In this case, NULL means not applicable , rather than unknown , and needs to be treated differently. Distinction between two interpretations for NULLs is done using meta-data. For every attribute, meta-data would determine whether a NULL value should be treated as either not applicable or unknown. Not applicable is a distinct value for mining purposes, whereas unknown implies that the attribute is relevant but its value is unknown. Not many data mining algorithms can correctly accommodate this subtle difference.

Big data mining for e-commerce websites unfold new patches to look and work into for business growth. The aforementioned challenges need to be addressed in order to have an edge over the competitors. Overcoming these will help both, the customers, and the service providers in many ways.

Categories: Big Data
Tags: Big Data, bots, ecommerce

About Chris Low

Christopher Low is a project manager and lead developer with Sherlock Software. He is also the founder and owner of MyTeamPlan, a desktop-based project management software tool that is targeted at small and medium businesses.

Primary Sidebar

E-mail Newsletter

Sign up to receive email updates daily and to hear what's going on with us!

Publish
AN Article
Submit
a press release
List
AN Event
Create
A Job Post

Related Articles

The Advantages of IT Staff Augmentation Over Traditional Hiring

May 4, 2023 By Mukesh Ram

The State of Digital Asset Management in 2023

May 3, 2023 By pimcoremkt

Test Data Management – Implementation Challenges and Tools Available

May 1, 2023 By yash.mehta262

Related Jobs

  • Software Engineer | South Yorkshire, GB - February 07, 2023
  • Software Engineer with C# .net Investment House | London, GB - February 07, 2023
  • Senior Java Developer | London, GB - February 07, 2023
  • Software Engineer – Growing Digital Media Company | London, GB - February 07, 2023
  • LBG Returners – Senior Data Analyst | Chester Moor, GB - February 07, 2023
More Jobs

Tags

AI Amazon analysis analytics application Artificial Intelligence BI Big Data business China Cloud Companies company crypto customers Data design development digital engineer engineering environment experience future Google+ government Group health information learning machine learning mobile news public research security services share skills social social media software solutions strategy technology

Related Events

  • 6th Middle East Banking AI & Analytics Summit 2023 | Riyadh, Saudi Arabia - May 10, 2023
  • Data Science Salon NYC: AI & Machine Learning in Finance & Technology | The Theater Center - December 7, 2022
  • Big Data LDN 2023 | Olympia London - September 20, 2023
More events

Related Online Courses

  • Oracle Cloud Data Management Foundations Workshop
  • Data Science at Scale
  • Statistics with Python
More courses

Footer


Datafloq is the one-stop source for big data, blockchain and artificial intelligence. We offer information, insights and opportunities to drive innovation with emerging technologies.

  • Facebook
  • LinkedIn
  • RSS
  • Twitter

Recent

  • 5 Reasons Why Modern Data Integration Gives You a Competitive Advantage
  • 5 Most Common Database Structures for Small Businesses
  • 6 Ways to Reduce IT Costs Through Observability
  • How is Big Data Analytics Used in Business? These 5 Use Cases Share Valuable Insights
  • How Realistic Are Self-Driving Cars?

Search

Tags

AI Amazon analysis analytics application Artificial Intelligence BI Big Data business China Cloud Companies company crypto customers Data design development digital engineer engineering environment experience future Google+ government Group health information learning machine learning mobile news public research security services share skills social social media software solutions strategy technology

Copyright © 2023 Datafloq
HTML Sitemap| Privacy| Terms| Cookies

  • Facebook
  • Twitter
  • LinkedIn
  • WhatsApp

In order to optimize the website and to continuously improve Datafloq, we use cookies. For more information click here.

settings

Dear visitor,
Thank you for visiting Datafloq. If you find our content interesting, please subscribe to our weekly newsletter:

Did you know that you can publish job posts for free on Datafloq? You can start immediately and find the best candidates for free! Click here to get started.

Not Now Subscribe

Thanks for visiting Datafloq
If you enjoyed our content on emerging technologies, why not subscribe to our weekly newsletter to receive the latest news straight into your mailbox?

Subscribe

No thanks

Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

Marketing cookies

This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.

Keeping this cookie enabled helps us to improve our website.

Please enable Strictly Necessary Cookies first so that we can save your preferences!