Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
32 views2 pages

Big Data Data Science QA Detailed

The document discusses various aspects of Big Data and Data Science, highlighting challenges such as data volume, variety, velocity, and quality. It differentiates between Business Intelligence and Data Science, outlines the modern analytical architecture, and describes key roles and skills in the Big Data ecosystem. Additionally, it covers Big Data Analytics, the data science process, and concepts like soft state eventual consistency.

Uploaded by

Sahil Sayyad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views2 pages

Big Data Data Science QA Detailed

The document discusses various aspects of Big Data and Data Science, highlighting challenges such as data volume, variety, velocity, and quality. It differentiates between Business Intelligence and Data Science, outlines the modern analytical architecture, and describes key roles and skills in the Big Data ecosystem. Additionally, it covers Big Data Analytics, the data science process, and concepts like soft state eventual consistency.

Uploaded by

Sahil Sayyad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Big Data & Data Science - Q&A Summary

Q: What are the challenges with Big Data?

A: Big Data presents several challenges including managing the enormous volume of data, handling various

types and formats (structured, semi-structured, unstructured), processing data at high speed (velocity), and

ensuring data quality, consistency, and security. Integration from multiple sources and the shortage of skilled

professionals to handle Big Data tools and frameworks are also significant issues.

Q: Write a note on data warehouse environment.

A: A data warehouse is a centralized system designed for reporting and data analysis. It stores large volumes

of structured data from different sources. The environment typically includes source systems (ERP, CRM),

ETL processes (Extract, Transform, Load), a central repository (data warehouse), data marts, and tools for

reporting and business intelligence. It is time-variant, non-volatile, and optimized for querying and analysis

rather than transaction processing.

Q: Explain the differences between BI and Data Science.

A: Business Intelligence (BI) uses historical data to generate dashboards, reports, and visualizations to

support business decisions. It is primarily descriptive in nature. Data Science, on the other hand, is predictive

and prescriptive, using statistical methods, algorithms, and machine learning to discover patterns and

forecast future trends. BI tools include Tableau and Power BI, while data scientists use Python, R, and ML

libraries.

Q: Describe the current analytical architecture for data scientists.

A: Modern data science architecture includes multiple layers: data ingestion from APIs, sensors, or

databases; data storage using data lakes and warehouses; processing with distributed tools like Apache

Spark or Hadoop; model development using Python, R, and ML frameworks; and finally deployment using

MLOps tools like MLflow and Docker. Visualization tools such as Tableau or Power BI are used to

communicate findings.

Q: What are key roles for the New Big Data Ecosystem?

A: The new Big Data Ecosystem includes roles like Data Engineers who build data pipelines, Data Scientists
Big Data & Data Science - Q&A Summary

who analyze and model data, Analysts who interpret data trends, Machine Learning Engineers who deploy

models, and Data Architects who design the infrastructure. Other roles include BI Developers, Data

Stewards, MLOps Engineers, and Chief Data Officers. Collaboration among these roles ensures effective

data-driven decision making.

Q: What are key skill sets and behavioral characteristics of a data scientist?

A: A successful data scientist possesses technical skills like programming (Python, R), statistics, machine

learning, data wrangling, and data visualization. Familiarity with databases, cloud platforms, and Big Data

tools is also essential. Behaviorally, they should be curious, analytical, detail-oriented, and good

communicators. They must collaborate well with teams and adapt quickly to evolving data and technology

landscapes.

Q: What is Big Data Analytics? Explain in detail with its example.

A: Big Data Analytics is the process of analyzing large, diverse datasets to uncover patterns, correlations,

and trends. It involves collecting data from multiple sources, cleaning and processing it, applying analytical

models, and visualizing insights. For example, Amazon uses Big Data Analytics to recommend products by

analyzing user behavior, search history, and purchase data in real-time to enhance customer experience.

Q: Write a short note on data science and data science process.

A: Data Science is the field of extracting meaningful insights from data using analytical, statistical, and

machine learning techniques. The process includes problem definition, data collection, cleaning, exploratory

analysis, feature engineering, model building, evaluation, and deployment. This cycle helps businesses make

data-driven decisions, such as predicting customer churn or detecting fraud.

Q: Write a short note on soft state eventual consistency.

A: Soft state refers to systems where the state can change over time, even without input. Eventual

consistency means that in distributed systems, all updates will propagate, and data will become consistent

across nodes over time. This model supports high availability and scalability, commonly used in NoSQL

databases like Cassandra and DynamoDB where real-time consistency is not always critical.

You might also like