Google Data Analytics Professional Certificate Answers - Coursera (2024)

Table of Contents
Course 1 – Foundations: Data, Data, Everywhere Week 1 – Introducing data analytics Week 2 – All about analytical thinking Week 3 – The wonderful world of data Week 4 – Set up your toolbox Week 5 – Endless career possibilities Course Challenge Course 2 – Ask Questions to Make Data-Driven Decisions Week 1 – Effective questions Week 2 – Data-driven decisions Week 3 – More spreadsheet basics Week 4 – Always remember the stakeholder Course challenge Course 3 – Prepare Data for Exploration Week 1 – Data types and structures Week 2 – Bias, credibility, privacy, ethics, and access Week 3 – Databases: Where data lives Week 4 – Organizing and protecting your data Course challenge Course 4 – Process Data from Dirty to Clean Week 1 – The importance of integrity Week 2 – Sparkling-clean data Week 3 – Cleaning data with SQL Week 4 – Verify and report on your cleaning results Course challenge Course 5 – Analyze Data to Answer Questions Week 1 – Organizing data to begin analysis Week 2 – Formatting and adjusting data Week 3 – Aggregating data for analysis Week 4 – Performing data calculations Course challenge Course 6 – Share Data Through the Art of Visualization Week 1 – Visualizing data Week 2 – Creating data visualizations with Tableau Week 3 – Crafting data stories Week 4 – Developing presentations and slideshows Course challenge Course 7 – Data Analysis with R Programming Week 1 – Programming and data analytics Week 2 – Programming using RStudio Week 3 – Working with data in R Week 4 – More about visualizations, aesthetics, and annotations Week 5 – Documentation and reports Course challenge Course 8 - Google Data Analytics Capstone: Complete a Case Study Week 1 – Learn about capstone basics Week 3 – Optional: Using your portfolio Week 4

€30

€30

CertificationAnswers.com

1 rating

Google Data Analytics Professional Certificate Answers - Coursera

Whether you’re just getting started or want to take the next step in the high-growth field of data analytics, professional certificates from Google can help you gain in-demand skills. You’ll learn about R programming, SQL, Python, Tableau and more.

Data analysts prepare, process, and analyze data to help inform business decisions. They create visualizations to share their findings with stakeholders and provide recommendations driven by data.

This certification is part of Google Career Certificates .

Complete a Google Career Certificate to get exclusive access to CareerCircle, which offers free 1-on-1 coaching, interview and career support, and a job board to connect directly with employers, including over 150 companies in the Google Career Certificates Employer Consortium.

Language: English

Certification URLs:

grow.google/certificates/data-analytics

coursera.org/google-certificates/data-analytics-certificate

Questions:

Course 1 – Foundations: Data, Data, Everywhere

Week 1 – Introducing data analytics

Fill in the blank: A collection of elements that interact with one another to produce, manage, store, organize, analyze, and share data is known as a data ______ .

  • environment
  • ecosystem
  • model
  • cloud

Fill in the blank: In data science, ________ is when a data analyst uses their unique past experiences to understand the story the data is telling.

  • rational thought
  • gut instinct
  • personal opinion
  • awareness

Fill in the blank: When posting in a discussion forum, you should make sure that any articles discussed are _______ to data analytics.

  • unique
  • well known
  • relevant
  • popular
  1. Data analysis is the various elements that interact with one another in order to provide, manage, store, organize, analyze, and share data.
  • True
  • False
  1. In data analytics, a model is a group of elements that interact with one another.
  • True
  • False
  1. Fill in the blank: The primary goal of a data _____ is to create new questions using data.
  • designer
  • analyst
  • engineer
  • scientist
  1. Fill in the blank: The term _____ is defined as an intuitive understanding of something with little or no explanation.
  • personal opinion
  • rational thought
  • gut instinct
  • awareness
  1. A company defines a problem it wants to solve. Then, a data analyst gathers relevant data, analyzes it, and uses it to draw conclusions. The analyst shares their analysis with subject-matter experts, who validate the findings. Finally, a plan is put into action. What does this scenario describe?
  • Data science
  • Data-driven decision-making
  • Customer service
  • Identification of trends
  1. What do subject-matter experts do to support data-driven decision-making? Select all that apply.
  • Offer insights into the business problem
  • Review the results of data analysis and identify any inconsistencies
  • Collect, transform, and organize data
  • Validate the choices made as a result of the data insights
  1. You have just finished analyzing data for a marketing project. Before moving forward, you share your results with members of the marketing team to see if they might have additional insights into the business problem. What practice does this support?
  • Data analytics
  • Data science
  • Data-driven decision-making
  • Data management
  1. You read an interesting article about data analytics in a magazine and want to share some ideas from the article in the discussion forum. In your post, you include the author and a link to the original article. This would be an inappropriate use of the forum.
  • True
  • False
  1. Which of the following options describes data analysis?
  • The various elements that interact with one another in order to provide, manage, store, organize, analyze, and share data
  • Creating new ways of modeling and understanding the unknown by using raw data
  • The collection, transformation, and organization of data in order to draw conclusions, make predictions, and drive informed decision-making
  • Using facts to guide business strategy
  1. In data analytics, what term describes a collection of elements that interact with one another to produce, manage, store, organize, analyze, and share data?
  • The cloud environment
  • A modeling system
  • A data ecosystem
  • A database
  1. Select the best description of gut instinct.
  • Choosing facts that complement your personal experiences
  • An intuitive understanding of something with little or no explanation
  • Manipulating data to match your intuition
  • Using your innate ability to analyze results
  1. A furniture manufacturer wants to find a more environmentally friendly way to make its products. A data analyst helps solve this problem by gathering relevant data, analyzing it, and using it to draw conclusions. The analyst then shares their analysis with subject-matter experts from the manufacturing team, who validate the findings. Finally, a plan is put into action. This scenario describes data-driven decision making.
  • True
  • False
  1. Fill in the blank: _______ are an important part of data-driven decision-making because they are people familiar with the business problem and can offer insight into the results of data analysis.
  • Customers
  • Competitors
  • Subject-matter experts
  • Stakeholders
  1. Consulting with experts in the marketing department about your marketing analysis is an example of what process?
  • Data analytics
  • Data-driven decision-making
  • Data management
  • Data science
  1. You have recently subscribed to an online data analytics magazine. You really enjoyed an article and want to share it in the discussion forum. Which of the following would be appropriate in a post? Select all that apply.
  • Checking your post for typos or grammatical errors.
  • Including an advertisement for how to subscribe to the data analytics magazine.
  • Giving credit to the original author.
  • Including your own thoughts about the article.
  1. Which of the following could be elements of a data ecosystem? Select all that apply
  • Sharing data
  • Producing data
  • Gaining insights
  • Managing data
  1. If you are using data-driven decision-making, what action steps would you take? Select all that apply.
  • Surveying customers about results, conclusions, and recommendations
  • Gathering and analyzing data
  • Sharing your results with subject matter experts
  • Drawing conclusions from your analysis
  1. What do subject-matter experts do to support data-driven decision-making? Select all that apply.
  • Collect, transform, and organize data
  • Offer insights into the business problem
  • Review the results of data analysis and identify any inconsistencies
  • Validate the choices made as a result of the data insights
  1. Fill in the blank: When following data-driven decision-making, a data analyst will consult with ______ .
  • subject matter experts
  • stakeholders
  • managers
  • customers
  1. What is the purpose of data analysis? Select all that apply.
  • To drive informed decision-making
  • To create models of data
  • To draw conclusions
  • To make predictions
  1. A data analyst is someone who does what?
  • Designs new products
  • Creates new questions using data
  • Solves engineering problems
  • Finds answers to existing questions by creating insights from data sources
  1. What tactics can a data analyst use to effectively blend gut instinct with facts? Select all that apply.
  • Use their knowledge of how their company works to better understand a business need.
  • Focus on intuition to choose which data to collect and how to analyze it.
  • Ask how to define success for a project, but rely most heavily on their own personal perspective.
  • Apply their unique past experiences to their current work, while keeping in mind the story the data is telling.
  1. To get the most out of data-driven decision-making, it’s important to include insights from people very familiar with the business problem. What are these people called?
  • Subject-matter experts
  • Customers
  • Stakeholders
  • Competitors
  1. A music streaming service is looking to increase user engagement on their platform. The CEO decides to leverage the company's user data and tasks the data analysts with uncovering unknown trends and characteristics of the companies user base. This strategy is known as what?
  • Data analytics decision-making
  • Data science decision-making
  • Data management decision-making
  • Data-driven decision-making
  1. You read an interesting article in a magazine and want to share it in the discussion forum. What should you do when posting? Select all that apply.
  • Check your post for typos or grammatical errors
  • Include your email address for people to send questions or comments
  • Make sure the article is relevant to data analytics
  • Take credit for creating the article
  1. A data scientist is someone who does what?
  • Creates new questions using data
  • Finds answers to existing questions by creating insights from data sources
  • Solves engineering problems
  • Solves engineering problems
  1. Data analysts act as detectives to uncover clues within the data. Like a detective, a data analyst may use their _______ to solve business problems.
  • personal opinion
  • rational thought
  • gut instinct
  • awareness
  1. In data-driven decision-making, a data analyst would share their results with subject matter experts and draw conclusions from their analysis. What else would a data analyst do in data-driven decision-making?
  • Identification of trends
  • Determining the stakeholders.
  • Survey customers about results, conclusions, and recommendations
  • Gather and analyze data
  1. Fill in the blank: _________ is the act of consulting with subject-matter experts about the results of your data analysis.
  • Data analytics
  • Data science
  • Data management
  • Data-driven decision-making
  1. Data ______ is the collection, transformation, and organization of data in order to draw conclusions, make predictions, and drive informed decision-making.
  • science
  • analysis
  • ecosystem
  • life cycle
  1. Fill in the blank: The primary goal of a data _____ is to find answers to existing questions by creating insights from data sources.
  • engineer
  • scientist
  • analyst
  • designer
  1. Sharing your results with subject matter experts and gathering and analyzing data are carried out in data driven-decision-making. What else is included in this process?
  • Determining the stakeholders
  • Identification of trends
  • Drawing conclusions from your analysis.
  • Surveying customers about results, conclusions, and recommendations
  1. Fill in the blank: The people very familiar with a business problem are called _____. They are an important part of data-driven decision-making.
  • subject-matter experts
  • customers
  • competitors
  • stakeholders
  1. Fill in the blank: When posting in a discussion forum, you should always check your post for _______ and grammatical errors
  • support
  • typos
  • importance
  • popularity
  1. Fill in the blank: Hardware, software, and the cloud all interact with each other to store and organize data in a _____.
  • cloud environment
  • modeling system
  • database
  • data ecosystem
  1. Gut instinct is an intuitive understanding of something with little or no explanation.
  • True
  • False

Week 2 – All about analytical thinking

Fill in the blank: Gathering additional information about data to understand the broader picture is an example of understanding _____.

  • problems
  • data
  • knowledge
  • context

Correlation is the aspect of analytical thinking that involves figuring out the specific details that help you execute a plan.

  • True
  • False

What method involves asking multiple questions in order to get to the root cause of a problem?

  • The five whys
  • Strategizing
  • Curiosity
  • Inquiry
  1. A junior data analyst is seeking out new experiences in order to gain knowledge. They watch videos and read articles about data analytics. They ask experts questions. Which analytical skill are they using?
  • Data strategy
  • Having a technical mindset
  • Curiosity
  • Understanding context
  1. Identifying the motivation behind data collection and gathering additional information are examples of which analytical skill?
  • Data design
  • A technical mindset
  • Understanding context
  • Data strategy
  1. Having a technical mindset is an analytical skill involving what?
  • Managing people, processes, and tools
  • Understanding the condition in which something exists or happens
  • Breaking things down into smaller steps or pieces
  • Balancing roles and responsibilities
  1. Fill in the blank: Data strategy involves _____ the people, processes, and tools used in data analysis.
  • supervising
  • managing
  • choosing
  • visualizing
  1. Correlation is the aspect of analytical thinking that involves figuring out the specifics that help you execute a plan.
  • True
  • False
  1. What method involves asking numerous questions in order to get to the root cause of a problem?
  • Strategizing
  • The five whys
  • Curiosity
  • Inquiry
  1. Gap analysis is a method for examining and evaluating how a process works currently in order to get where you want to be in the future.
  • True
  • False
  1. Data-driven decision-making involves the five analytical skills: curiosity, understanding context, having a technical mindset, data design, and data strategy. Each plays a role in data-driven decision-making.
  • True
  • False

Shuffle Q/A

  1. Fill in the blank: The analytical skill of ______ involves seeking out new experiences in order to gain knowledge.
  • understanding context
  • having a technical mindset
  • data strategy
  • curiosity
  1. Breaking things down into smaller steps or pieces and working with them in an orderly and logical way describes which analytical skill?
  • Data strategy
  • Context
  • Curiosity
  • A technical mindset
  1. In data analysis, data strategy is the analytical skill that involves managing which of the following? Select all that apply.
  • People
  • Consent
  • Tools
  • Processes
  1. A grocery store owner notices that they sell more orange juice during the winter season, when people are more likely to get sick. After observing this for a couple of years, they decide to stock more orange juice during the winter. The store owner is using which quality of analytical thinking?
  • detail-oriented thinking
  • correlation
  • problem-orientation
  • visualization
  1. The five whys is a technique that involves asking, “Why?” five times in order to achieve what goal?
  • Identify the root cause of a problem
  • Visualize how a process should look in the future
  • Put a plan into action
  • Use facts to guide business strategy
  1. In data analysis, one often examines and evaluates how a process currently works in order to get it to where they want it to be in the future. This is known as what?
  • Building a data visualization
  • Gap analysis
  • Determining the stakeholders
  • Asking the five whys
  1. A company is seeing a decline in organizational efficiency. They decide to hire an outside organization to help increase overall performance. The data analyst, working for the newly contracted company, utilizes five analytical skills: curiosity, understanding context, having a technical mindset, data design, and data strategy to deliver the project goals. Once the project goals are met, the analyst informs the decision makers of their findings and the project is completed. What strategy did the data analyst use to complete this project?
  • Gut instinct
  • Gap analysis
  • Data-driven decision-making
  • The five whys
  1. Identifying the motivation behind the collection of a dataset is an example of the analytical skill of understanding context.
  • True
  • False
  1. A technical mindset involves the ability to break things down into smaller steps or pieces and work with them in an orderly and logical way.
  • True
  • False
  1. Data design is how you organize information; data strategy is the management of the people, processes, and tools used in data analysis.
  • True
  • False
  1. What method involves examining and evaluating how a process works currently in order to get where you want to be in the future?
  • The five whys
  • Strategy
  • Gap analysis
  • Data visualization
  1. Seeking out new challenges and experiences in order to learn is an example of which analytical skill?
  • Curiosity
  • Data strategy
  • Understanding context
  • Having a technical mindset
  1. Which of the following examples best describe the analytical skill of understanding context? Select all that apply.
  • Adding descriptive headers to columns of data in a spreadsheet
  • Working with facts in an orderly manner
  • Gathering additional information about data to understand the broader picture
  • Identifying the motivation behind the collection of a dataset
  1. Fill in the blank: In data analysis, data strategy involves managing the people, processes, and _____ .
  • projects
  • procedures
  • consent
  • tools
  1. Identifying a relationship between two or more pieces of data is known as what?
  • visualization
  • correlation
  • problem-orientation
  • detail-oriented thinking
  1. As a new data analyst, your boss asks you to perform a gap analysis on one of their current processes. What does this entail?
  • Building a data visualization
  • Asking the five whys
  • Examining and evaluating how a process works currently in order to get where you want to be in the future
  • Determining the stakeholders
  1. Fill in the blank: In data-driven decision making, data analysts use five analytical skills of curiosity, understanding context, having a technical mindset, data design, and _______ .
  • data strategy
  • forward-looking
  • intuition
  • efficiency
  1. The analytical skill of understanding context entails which of the following?
  • Breaking things down into smaller steps or pieces
  • Managing people, processes, and tools
  • Balancing roles and responsibilities
  • Understanding the condition in which something exists or happens
  1. Fill in the blank: _____ involves the ability to break things down into smaller steps or pieces and work with them in an orderly and logical way.
  • Data strategy
  • Curiosity
  • Context
  • A technical mindset
  1. Which analytical skill involves managing the people, processes, and tools used in data analysis?
  • Understanding context
  • Data design
  • Data strategy
  • Curiosity
  1. The manager at a music shop notices that more trombones are repaired on the days when Alex and Jasmine work the same shift. After some investigation, the manager discovers that Alex is excellent at fixing slides, and Jasmine is great at shaping mouthpieces. Working together, Alex and Jasmine repair trombones faster. The manager is happy to have discovered this relationship and decides to always schedule Alex and Jasmine for the same shifts. In this scenario, the manager used which quality of analytical thinking?
  • Visualization
  • Problem-orientation
  • Correlation
  • Big-picture thinking
  1. Fill in the blank: In order to get to the root cause of a problem, a data analyst should ask “Why?” ________ times.
  • five
  • three
  • seven
  • four
  1. A company is receiving negative comments on social media about their products. To solve this problem, a data analyst uses each of their five analytical skills: curiosity, understanding context, having a technical mindset, data design, and data strategy. This makes it possible for the analyst to use facts to guide business strategy and figure out how to improve customer satisfaction. What is this an example of?
  • Data science
  • Gap analysis
  • Data-driven decision-making
  • Data visualization
  1. Data analysts following data-driven decision-making use the analytical skills of curiosity, having a technical mindset, and data design. What other two analytical skills would they employ? Select all that apply.
  • knowledge
  • data strategy
  • efficiency
  • understanding context
  1. Curiosity is the analytical skill of using your instinct to solve problems.
  • True
  • False
  1. Adding descriptive headers to columns of data in a spreadsheet is an example of which analytical skill?
  • Having a technical mindset
  • Understanding context
  • Data strategy
  • Curiosity
  1. A company has recently tasked their data science team with figuring out what is causing the decline in production at one of their plants. The data analysts ask a number of questions trying to get to the root cause of the problem. This technique is known as what?
  • Inquiry
  • The five whys
  • Curiosity
  • Strategizing

Week 3 – The wonderful world of data

A business analyst recently completed a project that their company has decided to use to solve a larger business problem. What step is this in the data analysis process?

  • Process
    • Analyze
  • Act
  • Share

A set of instructions used to perform a specified calculation is known as what?

  • A particular value
    • A function
    • A predefined statement
  • A formula

Which of the following is an example of why a data analyst may generate a query?

  • Visualizing data
  • Requesting data
  • Collecting data
  • Recording data
  1. Fill in the blank: A business decides what kind of data it needs, how the data will be managed, and who will be responsible for it during the _____ stage of the data life
  • analyze
  • manage
  • plan
  • capture
  1. The destroy stage of the data life cycle might involve which of the following actions? Select all that apply.
  • Storing data for future use
  • Shredding paper files
  • Uploading data to the cloud
  • Using data-erasure software
  1. During the capture stage of the data life cycle, a data analyst may use spreadsheets to aggregate data.
  • True
  • False
  1. Describe how the data life cycle differs from data analysis.
  • The data life cycle deals with making informed decisions; data analysis is using tools to transform data.
  • The data life cycle deals with transforming and verifying data; data analysis is using the insights gained from the data.
  • The data life cycle deals with identifying the best data to solve a problem; data analysis is about asking effective questions.
  • The data life cycle deals with the stages that data goes through during its useful life; data analysis is the process of analyzing data.
  1. What actions might a data analytics team take in the act phase of the data analysis process? Select all that apply.
  • Sharing analysis results using data visualizations
  • Putting a plan into action to help solve the business problem
  • Validating insights provided by analysts
  • Finalizing a strategy based on the analysis
  1. Fill in the blank: A formula is a set of instructions used to perform a specified calculation; whereas a function is _____.
  • a predefined operation
  • a question written by the user
  • a particular value
  • a computer programming language
  1. Fill in the blank: To request, retrieve, and update information in a database, data analysts use a ____.
  • calculation
  • dashboard
  • query
  • formula
  1. Structured query language (SQL) enables data analysts to communicate with a database.
  • True
  • False

Shuffle Q/A

  1. You are in the plan stage of the data lifecycle for your current project. What action might you take during this stage?
  • Decide what kind of data is needed.
  • Use a formula to perform calculations.
  • Validate insights provided by analysts.
  • Shred paper files.
  1. A data analyst is working at a small tech startup. They’ve just completed an analysis project, which involved private company information about a new product launch. In order to keep the information safe, the analyst uses secure data-erasure software for the digital files and a shredder for the paper files. Which state of the data life cycle does this describe?
  • Manage
  • Archive
  • Destroy
  • Plan
  1. A data analyst is working at a small tech startup. On their current project they are in the analyze stage of the data life cycle. What might they do in this stage?
  • Choose the format of their spreadsheet headings
  • Determine who is responsible for managing the data
  • Validate the insights provided by analysts
  • Use a formula to perform calculations
  1. Fill in the blank: Data analysis has six process steps whereas the data life cycle has six _____.
  • data analytics tools
  • steps
  • stages
  • key questions
  1. What is the main difference between a formula and a function?
  • A formula can be used multiple times in a spreadsheet; a function can only be used once.
  • A formula begins with an equal sign (=); a function begins with an asterisk (*).
  • A formula is a set of instructions used to perform a specified calculation; a function is a preset command that automatically performs a specified process.
  • A formula is used to add or subtract; a function is used to multiply or divide.
  1. What does a data analyst use to request information within a database?
  • Calculation
  • Dashboard
  • Formula
  • Query
  1. Why is SQL the most popular structured query language? Select all that apply.
  • SQL allows data analysts to use spreadsheets
  • SQL is the most secure database on the market
  • SQL is easy to understand
  • SQL works with a wide variety of databases
  1. A data analyst uses spreadsheets to aggregate data during the capture phase of the data life cycle.
  • True
  • False
  1. Fill in the blank: The data life cycle has six _____ .
  • data analytics tools
  • process steps
  • key questions
  • stages
  1. Fill in the blank: A query is used to _____ information from a database. Select all that apply.
  • request
  • retrieve
  • visualize
  • update
  1. Structured query language (SQL) allows a data analyst to retrieve and request data from a database. What else is SQL used for?
  • Visualizing data within a database
  • The revising phase of the data life cycle
  • Updating databases
  • The sharing phase of the data life cycle
  1. Fill in the blank: A business is determining who should be responsible for the data in their current data analysis project. This means that the company is in the ______ stage of the data life cycle.
  • manage
  • plan
  • analyze
  • capture
  1. Fill in the blank: Shredding paper files and using data-erasure software would be actions taken by a data analyst in the _________ stage of the data lifecycle.
  • Manage
  • Plan
  • Archive
  • Destroy
  1. Fill in the blank: Data analysis has six parts that are divided into distinct _____.
  • process steps
  • key questions
  • data analytics tools
  • stages
  1. Fill in the blank: In the _____ phase of the data analysis process, a data analytics team might validate the insights provided by analysts.
  • process
  • share
  • analyze
  • act
  1. In data analysis, a predefined operation is known as what?
  • A function
  • A formula
  • A particular value
  • A predefined statement
  1. In the course of their current project, a data analyst uses a query to retrieve and request information. Which of the following is a third option they can use a query for?
  • Visualizing data
  • Updating data
  • Deleting data
  • Collecting data
  1. In which stage of the data life cycle does a business decide what kind of data it needs, how the data will be managed, and who will be responsible for it?
  • Manage
  • Plan
  • Capture
  • Analyze
  1. A company takes the insights provided by its data analytics team, validates them, and finalizes a strategy. They then implement a plan to solve the original business problem. This describes the share step of the data analysis process.
  • True
  • False
  1. In the course of their current project, a data analyst uses a query to retrieve and request information. Which of the following are options the analyst can use a query for? Select all that apply.
  • Updating data
  • Collecting data
  • Visualizing data
  • Deleting data
  1. In the plan stage of the data life cycle, what decisions would a data analyst make? Select all that apply.
  • Who will be responsible for the data
  • How the data will be managed
  • What kind of data is needed
  • How the data will be analyzed
  1. In the analyze phase of the data life cycle, what might a data analyst do? Select all that apply.
  • Use spreadsheets to aggregate data
  • Use a formula to perform calculations
  • Create a report from their data
  • Chooses the format of their spreadsheet headings
  1. Fill in the blank: In the act phase of the data analysis process, a company may need to _____ the insights of the data analysis team.
  • accomplish
  • revise
  • validate
  • calculate
  1. In data analysis, a function is a predefined operation whereas a formula is a set of instructions used to carry out a specific calculation.
  • True
  • False
  1. A data analyst has finished an analysis project that involved private company data. They erase the digital files in order to keep the information secure. This describes which stage of the data life cycle?
  • Plan
  • Destroy
  • Archive
  • Manage
  1. Fill in the blank: Using a formula to perform calculations, creating a report from their data, and using spreadsheets to aggregate data would all be actions carried out in the ________ stage of the data lifecycle.
  • manage
  • plan
  • analyze
  • capture
  1. Fill in the blank: In the _____ phase of the data analysis process, a data analytics team might validate the insights provided by analysts.
  • process
  • act
  • analyze
  • share
  1. Fill in the blank: Structured query language (SQL) enables data analysts to _____ information from a database. Select all that apply.
  • retrieve
  • visualize
  • request
  • update

Week 4 – Set up your toolbox

You are working with a database table named employee that contains data about employees. You want to review all the columns in the table.

You write the SQL query below. Add a FROM clause that will retrieve the data from the employee table.

SELECT

*

FROM employee

What employee has the job title of Sales Manager?

  • Margaret Park
    • Andrew Adams
  • Nancy Edwards
  • Michael Mitchell

A data analyst creates the following visualization to clearly demonstrate how much more populous Charlotte is than the next-largest North Carolina city, Raleigh. What type of chart do they use?

  • A line chart
  • A column, or bar, chart
  • A scatter chart
  • A pie chart

Fill in the blank: A data analyst has to demonstrate how the population in a city has increased over time. In particular, they want to be able to see when the population has exceeded certain thresholds. The chart that would work best for this is a/an _____ chart.

  • area
  • line
  • column
  • bar
  1. In the following spreadsheet, the column labels in row 1 are called what?
  • Criteria
  • Attributes
  • Descriptors
  • Characteristics
  1. Fill in the blank: In row 8 of the following spreadsheet, you can find the _____ of Cary.
  • format
  • attribute
  • criteria
  • observation
  1. Fill in the blank: In the following spreadsheet, the _____ feature was used to alphabetize the city names in column B.
  • Organize range
  • Name range
  • Randomize range
  • Sort range
  1. A data analyst types =POPULATION(C2:C11) to find the average population of the cities in this spreadsheet. However, they realize they used the wrong formula. What syntax will correct this function?
  • =AVERAGE(C2-C11)
  • AVERAGE(C2:C11)
  • AVERAGE(C2-C11)
  • =AVERAGE(C2:C11)
  1. You are working with a database table named genre that contains data about music genres. You want to review all the columns in the table.

You write the SQL query below. Add a FROM clause that will retrieve the data from the genre table.

What is the name of the genre with ID number 3?

  • Jazz
  • Rock
  • Metal
  • Blues
  1. You are working with a database table that contains invoice data. The customer_id column lists the ID number for each customer. You are interested in invoice data for the customer with ID number 35.

You write the SQL query below. Add a WHERE clause that will return only data about the customer with ID number 35.

After you run your query, use the slider to view all the data presented.

What is the billing country for the customer with ID number 35?

  • Ireland
  • Argentina
  • Portugal
  • India
  1. A data analyst creates the following visualization to clearly demonstrate how much more populous Charlotte is than the next-largest North Carolina city, Raleigh. What type of chart is it?
  • A scatter chart
  • A column, or bar, chart
  • A line chart
  • A pie chart
  1. A data analyst wants to demonstrate a trend of how something has changed over time. What type of chart is best for this task?
  • Area
  • Column
  • Line
  • Bar

Shuffle Q/A

  1. Fill in the blank: In row 1 of the following spreadsheet, the words rank and name are called _____?
  • attributes
  • characteristics
  • criteria
  • descriptors
  1. In the following spreadsheet, where can you find all of the attributes—also known as the observation—of Fayetteville?
  • Row 7
  • Column B
  • Row 6
  • Cell B7
  1. Fill in the blank: In the following spreadsheet, the feature sort range can be used to ________ the city names in column B?
  • change
  • alphabetize
  • randomize
  • delete
  1. The function =AVERAGE(C2:C11) can be used to do what for the following spreadsheet?
  • Arrange the rows according to increasing population size.
  • Find the city with the largest population.
  • Arrange the rows according to decreasing population size.
  • Find the average population of the cities
  1. You are working with a database table named employee that contains data about employees. You want to review all the columns in the table.

You write the SQL query below. Add a FROM clause that will retrieve the data from the employee table.

What employee has the job title of Sales Manager?

  • Nancy Edwards
  • Margaret Park
  • Michael Mitchell
  • Andrew Adams
  1. You are working with a database table that contains invoice data. The customer_id column lists the ID number for each customer. You are interested in invoice data for the customer with ID number 40.

You write the SQL query below. Add a WHERE clause that will return only data about the customer with ID number 40.

After you run your query, use the slider to view all the data presented.

What is the billing city for the customer with ID number 40?

  • Paris
  • Dijon
  • London
  • Buenos Aires
  1. A data analyst has to create a visualization that makes it easy to show which of the top ten most populous cities in North Carolina have a population below 250,000 people. What type of chart would be best for this visualization?
  • Line chart
  • Pie chart
  • Bar chart
  • Scatter chart
  1. A data analyst wants to demonstrate how the population in Charlotte has increased over time. They create this data visualization. This is an example of an area chart.
  • True
  • False
  1. In row 1 of the following spreadsheet, the words rank, name, population, and county are called what?
  • Attributes
  • Descriptors
  • Criteria
  • Characteristics
  1. In the following spreadsheet, what feature was used to alphabetize the city names in column B?
  • Organize range
  • Sort range
  • Name range
  • Randomize range
  1. To find the average population of the cities in this spreadsheet, you type =AVERAGE. What is the proper way to type the range that will complete your function?
  • (C2,C11)
  • (C2-C11)
  • (C2:C11)
  • (C2*C11)
  1. You are working with a database table named playlist that contains data about playlists for different types of digital media. You want to review all the columns in the table.

You write the SQL query below. Add a FROM clause that will retrieve the data from the playlist table.

What is the playlist with ID number 3?

  • Audiobooks
  • Music
  • Movies
  • TV Shows
  1. You are working with a database table that contains invoice data. The customer_id column lists the ID number for each customer. You are interested in invoice data for the customer with ID number 28.

You write the SQL query below. Add a WHERE clause that will return only data about the customer with ID number 28.

After you run your query, use the slider to view all the data presented.

What is the billing city for the customer with ID number 28?

  • Bangalore
  • Buenos Aires
  • Dijon
  • Salt Lake City
  1. Which of the following best describes a bar chart?
  • It is a visualization that uses a circle which is divided into wedges sized based on numerical proportion.
  • It is a visualization that plots a sequence of points and connects them with them with straight lines or curves.
  • It is a visualization that represents data with columns, or bars, the heights of which are proportional to the values that they represent.
  • It is a visualization that plots individual points in the Cartesian coordinate plane.
  1. A data analyst has to create a visualization that clearly shows when and for how long the population of Charlotte has been above one million people. They choose to use a line chart. Why is this the best choice for their visualization?
  • It is a visualization that plots a sequence of points and connects them with straight lines or curves.
  • It is a visualization that uses a circle which is divided into wedges sized based on numerical proportion.
  • It is a visualization that represents data with columns, or bars, the heights of which are proportional to the values that they represent.
  • It is a visualization that plots individual points in the Cartesian coordinate plane.
  1. The words rank, name, population, and county in row 1 of the following spreadsheet are known as descriptors.
  • True
  • False
  1. Fill in the blank: In the following spreadsheet, the ________ of High Point describes all of the data in row 10.
  • criteria
  • dataset
  • observation
  • format
  1. If a data analyst wants to list the cities in this spreadsheet alphabetically, instead of numerically, what feature can they use in column B?
  • Sort range
  • Name range
  • Randomize range
  • Organize range
  1. A data analyst wants to create a visualization that depicts the populations of the top ten most populous cities in North Carolina. What type of chart would be best for this?
  • A pie chart
  • A scatter chart
  • A column, or bar, chart
  • A line chart
  1. A data analyst has to demonstrate a trend of how something has changed over time. What type of chart is best for this task?
  • Line
  • Area
  • Bar
  • Column
  1. You are working with a database table that contains invoice data. The customer_id column lists the ID number for each customer. You are interested in invoice data for the customer with ID number 54.

You write the SQL query below. Add a WHERE clause that will return only data about the customer with ID number 54.

After you run your query, use the slider to view all the data presented.

What is the billing address for the customer with ID number 54?

  • 1033 N Park Ave
  • 230 Elgin St
  • 110 Raeburn Pl
  • 801 W 4th St
  1. Fill in the blank: A data analyst creates a table, but they realize this isn’t the best visualization for their data. To fix the problem, they decide to use the ____ feature to change it to a column chart.
  • chart editor
  • rename
  • filter view
  • image
  1. You are working with a database table named employee that contains data about employees. You want to review all the columns in the table.

You write the SQL query below. Add a FROM clause that will retrieve the data from the employee table.

What is the job title of Andrew Adams?

  • General Manager
  • Sales Manager
  • Sales Support Agent
  • IT Manager
  1. Fill in the blank: Suppose you wanted to determine the average population of the cities in the following spreadsheet. The correct function syntax to use would be ________ .
  • =AVERAGE(C2-C11)
  • AVERAGE(D2:D11)
  • AVERAGE(C2:C11)
  • =AVERAGE(C2:C11)

Week 5 – Endless career possibilities

A college IT department needs to reduce the number of computers on campus for student use. How could a data analyst help identify a solution to this problem?

  • Analyze the number of classes schedules across all classrooms
  • Analyze the utilization of the computer labs on campus
  • Analyze data on the number of students enrolled
  • Analyze the square footage of all computer labs on campus

In data analytics, what is the term for an obstacle to be solved?

  • Issue
    • Question
  • Problem
  • Solution
  1. An online gardening magazine wants to understand why its subscriber numbers have been increasing. A data analyst discovers that significantly more people subscribe when the magazine has its annual 50%-off sale. This is an example of what?
  • Analyzing consumer preferences using artificial intelligence
  • Analyzing customer buying behaviors
  • Analyzing social media engagement
  • Analyzing the number of customers by calculating daily foot traffic
  1. Fill in the blank: A doctor’s office has discovered that patients are waiting 20 minutes longer for their appointments than in past years. To help solve this problem, a data analyst could investigate how many nurses are on staff at a given time compared to the number of _____.
  • doctors seeing new patients
  • patients with appointments
  • negative comments about the wait times on social media
  • doctors on staff at the same time
  1. A problem is an obstacle to be solved, an issue is a topic to investigate, and a question is designed to discover information.
  • True
  • False
  1. What is a question or problem that a data analyst answers for a business?
  • Mission statement
  • Hypothesis
  • Complaint
  • Business task
  1. Fill in the blank: Data-driven decision-making is described as using _____ to guide business strategy.
  • gut instinct
  • visualizations
  • facts
  • intuition
  1. It’s possible for conclusions drawn from data analysis to be both true and unfair.
  • True
  • False
  1. A data analyst is analyzing fruit and vegetable sales at a grocery store. They’re able to find data on everything except red onions. What’s the best course of action?
  • Ask a teammate for help finding data on red onions.
  • Exclude red onions from the analysis.
  • Exclude all onion varieties from the analysis.
  • Use the data on white onions instead, as they’re both onion varieties.
  1. Collaborating with a social scientist to provide insights into human bias and social contexts is an effective way to avoid bias in your data.
  • True
  • False

Shuffle Q/A

  1. A restaurant hires a data analyst to determine the best times to have the restaurant open. Which of the following methods can the data analyst use to help build a better schedule for the restaurant? Select all that apply.
  • Analyze weekly weather data
  • Analyze staffing levels for different days
  • Examine hourly customer numbers
  • Survey customers on their preferred times to dine
  1. A restaurant has noticed that customers often wait longer in line than in previous years. How could a data analyst help solve this problem?
  • Analyze the average sales amount per customer
  • Analyze customer survey results about the preferred opening hours of the restaurant
  • Analyze the number of staff on shift at any time
  • Analyze the products customers are purchasing
  1. Fill in the blank: A business task is described as the _____ a data analyst answers for a business.
  • solution
  • complaint
  • question
  • comment
  1. When you make decisions using observation and intuition as a guide, you only see part of the picture. What can improve your decision-making?
  • Using data
  • Using assumptions
  • Creating surveys
  • Being decisive
  1. Data analysts ensure their analysis is fair for what reason?
  • Fairness helps them avoid biased conclusions.
  • Fairness helps them stay organized.
  • Fairness helps them communicate with stakeholders.
  • Fairness helps them pick and choose which data to include from a dataset.
  1. A large hotel chain sees about 500 customers per week. A data analyst working there is gathering data through customer satisfaction surveys. They are anxious to begin analysis, so they start analyzing the data as soon as they receive 50 survey responses. This is an example of what? Select all that apply.
  • Failing to include diverse perspectives in data collection
  • Failing to collect data anonymously
  • Failing to reward customers for participating in the survey
  • Failing to have a large enough sample size
  1. An online gardening magazine wants to understand why its subscriber numbers have been increasing. What kind of reports can a data analyst provide to help answer that question? Select all that apply.
  • Reports that describe how many customers shared positive comments about the gardening magazine on social media in the past year
  • Reports that predict the success of sales leads to secure future subscribers
  • Reports that examine how a recent 50%-off sale affected the number of subscription purchases
  • Reports that compare past weather patterns to the number of people asking gardening questions to their social media
  1. Fill in the blank: In data analytics, a question is _____.
  • an obstacle or complication that needs to be worked out
  • a way to discover information
  • a topic to investigate
  • a subject to analyze
  1. What must a data analyst establish before they can start to plan the best approach to gather and analyze information?
  • The business task
  • The statement
  • The complaint
  • The solution
  1. What is the process of using facts to guide business strategy?
  • Data-driven decision-making
  • Data ethics
  • Data visualization
  • Data programming
  1. A data analyst is developing a model. They start by gathering data for groups that are underrepresented in a sample. What strategy could they employ to ensure these groups are represented fairly?
  • Oversample the underrepresented group
  • Sample the underrepresented group normally
  • Combine the underrepresented group with another group
  • Exclude the underrepresented group from the sample
  1. A restaurant is trying to develop more effective staffing strategies. A data analyst recognizes that there are significantly fewer customers earlier in the business day. They conclude that opening later would be more effective for staffing. What is this an example of?
  • Creating efficiencies by analyzing customer foot traffic
  • Tailoring products to consumer buying habits
  • Creating more effective customer communication
  • Gathering customer opinions about business changes
  1. A restaurant has noticed many popular dishes are running out early in the day. How could a data analyst help identify a solution to this problem? Select all that apply.
  • Analyze ordering patterns of those products
  • Examine the number of sales of those products
  • Examine overall daily sales of the restaurant
  • Analyze the number of staff on shift during peak times
  1. When working for a restaurant, a data analyst is asked to examine and report on the daily sales data from year to year to help with making more efficient staffing decisions. What is this an example of?
  • A business task
  • An issue
  • A solution
  • A breakthrough
  1. Data-driven decision-making is using facts to guide business strategy. The benefits include which of the following? Select all that apply.
  • Getting a complete picture of a problem and its causes
  • Combining observation with objective data
  • Using data analytics to find the best possible solution to a problem
  • Making the most of intuition and gut instinct
  1. A data analyst is analyzing fruit and vegetable sales at a grocery store. They’re able to find data on everything except red onions. If they exclude red onions from the analysis, this would be an example of creating or reinforcing bias.
  • True
  • False
  1. A hotel is trying to gather data on their guests' satisfaction with their stay. Which of the following options would best help the hotel account for potential bias in their data?
  • Surveying guests at random times throughout the year
  • Only surveying guests who have booked their stay through a certain third-party website
  • Only surveying guests who have stayed at the hotel during peak season
  • Only surveying guests who have stayed at the hotel for more than 3 nights
  1. A restaurant is struggling to accurately staff for the different daily customer volumes. On some days, there are many servers and few customers. On other days, the restaurant is very busy and there are not enough servers and kitchen staff. What reports could a data analyst use to create more efficient staffing strategies? Select all that apply.
  • Reports of past and future reservations
  • Reports of past weather patterns in the area of the restaurant
  • Reports using historical sales data to predict sales for the current day/date
  • Reports of planned local events in the area of the restaurant
  1. Fill in the blank: In data analytics, a topic to investigate is also known as a(n) _____.
  • theme
  • issue
  • question
  • statement
  1. When a choice is made between good, bad, or a combination of consequences based on facts, it is also known as what?
  • Data-driven decision-making
  • Data ethics
  • Data visualization
  • Data programming
  1. At what point in the data analysis process should a data analyst consider fairness?
  • When decisions are made based on the conclusions
  • When data collection begins
  • When data is being organized for reporting
  • When conclusions are presented
  1. A restaurant is considering changing their operating hours. They survey customers that come in between 4 p.m. and 5 p.m. to get feedback on this potential change. What can the restaurant do to ensure the data analysis process is fair?
  • Expand the times when they survey customers
  • Survey only repeat customers
  • Reward customers for participating in the survey
  • Survey people walking by on the street
  1. A doctor’s office discovers that patients are waiting 20 minutes longer for their appointments than in past years. In what ways could a data analyst help solve this problem? Select all that apply.
  • Analyze the average length of an appointment this year compared to past years.
  • Analyze the number of patients seen per day compared to past years.
  • Analyze a recent change in the average rating for the doctor’s office on social media.
  • Analyze how many doctors and nurses are on staff at a given time compared to the number of patients with appointments
  1. Fill in the blank: Fairness is achieved when data analysis doesn’t create or _____ bias.
  • reinforce
  • constrain
  • highlight
  • resolve
  1. A gym wants to start offering exercise classes. A data analyst plans to survey 10 people to determine which classes would be most popular. To ensure the data collected is fair, what steps should they take? Select all that apply.
  • Ensure participants represent a variety of profiles and backgrounds.
  • Collect data anonymously.
  • Survey only people who don’t currently go to the gym.
  • Increase the number of participants.
  1. A doctor’s office has discovered that patients are waiting 20 minutes longer for their appointments than in past years. A data analyst could help solve this problem by analyzing how many doctors and nurses are on staff at a given time compared to the number of patients with appointments.
  • True
  • False
  1. Fill in the blank: Once an analyst has identified a problem for a business, they establish a(n)_____ to help inform the process of gathering the correct information.
  • issue
  • business task
  • statement
  • solution
  1. Which of the following best describes what fairness in data analytics means?
  • Ensuring that analysis does not create or reinforce bias
  • Including data from dominant groups
  • Collecting data objectively
  • Including self-reported data

Course Challenge

Scenario 1, question 1-5

You’ve just started a new job as a data analyst for a midsized pharmacy chain with 38 stores in the American Southwest. Your supervisor shares a new data analysis project with you.

She explains that the pharmacy is considering discontinuing a bubble bath product called Splashtastic. Your supervisor wants you to analyze sales data and determine what percentage of each store’s total daily sales come from that product. Then, you’ll present your findings to leadership.

You know that it's important to follow each step of the data analysis process: ask, prepare, process, analyze, share, and act. So, you begin by defining the problem and making sure you fully understand stakeholder expectations.

One of the questions you ask is where to find the dataset you’ll be working with. Your supervisor explains that the company database has all the information you need.

Next, you continue to the prepare step. You access the database and write a query to retrieve data about Splashtastic. You notice that there are only 38 rows of data, representing the company’s 38 stores. In addition, your dataset contains five columns: Store Number, Average Daily Customers, Average Daily Splashtastic Sales (Units), Average Daily Splashtastic Sales (Dollars), and Average Total Daily Sales (All Products). You decide to use a spreadsheet to work with the data because you know that spreadsheets work well for processing and analyzing a small dataset, like the one you’re using.

Fill in the blank: To get the data from the database into a spreadsheet, you would first _____ the data as a .CSV file, then import it into a spreadsheet.

  • email
  • download
  • copy and paste
  • print

Scenario 1 continued

You’ve downloaded the data from your company database and imported it into a spreadsheet. IMPORTANT: To answer questions using this dataset for the scenario, click the link below and select the “Use Template” button before answering the questions.

Link to template: Course Challenge - Scenario 1

OR

If you don’t have a Google account, you can download the template directly from the attachment below.

Course Challenge Dataset - Scenario 1 - Scenario 1_ Pharmacy Data - Part 1

CSV File

Now, it’s time to process the data. As you know, this step involves finding and eliminating errors and inaccuracies that can get in the way of your results. While cleaning the data, you notice that information about Splashtastic is missing for Store Number 15 in Row 16. Which of the following would be an appropriate course of action?

  • Delete the row with the missing data point.
    • Replace the row with the average values of the other data points.
    • Sort the spreadsheet so the row with missing data is at the bottom.
  • Investigate previous projects and see how this was dealt with there.

Scenario 1 continued

Once you’ve found the missing information, you analyze your dataset.

During analysis, you create a new column F. At the top of the column, you add: Average Percentage of Total Sales - Splashtastic. What is this column label called?

  • A title
    • A reference
  • An attribute
  • A headline

Scenario 1 continued

Next, you determine the average total daily sales over the past 12 months at all stores. The entire range of cells that contain these sales are E2:E39. The correct syntax is =AVERAGE(E2:E39).

  • True
  • False

Scenario 1 continued

Fill in the blank: You’ve reached the share phase of the data analysis process. One of the things that you can do in this phase is to prepare a _____ about Splashtastic’s sales and practice your presentation.

  • prediction
    • finding
    • record
  • slideshow

Scenario 2, questions 6-10

You’ve been working for the nonprofit National Dental Society (NDS) as a junior data analyst for about two months. The mission of the NDS is to help its members advance the oral health of their patients. NDS members include dentists, hygienists, and dental office support staff.

The NDS is passionate about patient health. Part of this involves automatically scheduling follow-up appointments after crown replacement, emergency dental surgery, and extraction procedures. NDS believes the follow-up is an important step to ensure patient recovery and minimize infection.

Unfortunately, many patients don’t show up for these appointments, so the NDS wants to create a campaign to help its members learn how to encourage their patients to take follow-up appointments seriously. If successful, this will help the NDS achieve its mission of advancing the oral health of all patients.

Your supervisor has just sent you an email saying that you’re doing very well on the team, and he wants to give you some additional responsibility. He describes the issue of many missed follow-up appointments. You are tasked with analyzing data about this problem and presenting your findings using data visualizations.

An NDS member with three dental offices in Colorado offers to share its data on missed appointments. So, your supervisor uses a database query to access the dataset from the dental group. The query instructs the database to retrieve all patient information from the member’s three dental offices, located in zip code 81137.

The table is dental_data_table, and the column name is zip_code. You write the following query, but get an error. What statement will correct the problem?

SELECT * FROM dental_data_table WHERE zip code = 81137

  • zip_code = 81137
    • WHERE_zip code = 81137
  • WHERE zip_code = 81137
  • WHERE 81137

Scenario 2 continued

The dataset your supervisor retrieved and imported into a spreadsheet includes a list of patients, their demographic information, dental procedure types, and whether they attended their follow-up appointment. To use the dataset for this scenario, click the link below and select “Use Template.”

Link to template: Course Challenge - Scenario 2

OR

If you don’t have a Google account, you can download the template directly from the attachment below.

Course Challenge Dataset - Scenario 2

CSV File

The patient demographic information includes data such as age, gender, and home address. When examining the geographic data, you notice that all the patients live in the same zip code.

Fill in the blank: The fact that the dataset includes people who all live in the same zip code might get in the way of ______.

  • fairness
  • accuracy
  • spreadsheet formulas or functions
  • data visualization

Scenario 2 continued

As you’re reviewing the dataset, you notice that there are a disproportionate number of senior citizens. So, you investigate further and find out that this zip code represents a rural community in Colorado with about 800 residents. In addition, there’s a large assisted-living facility in the area. Nearly 300 of the residents in the 81137 zip code live in the facility.

You recognize that’s a sizable number, so you want to find out if age has an effect on a patient’s likelihood to attend a follow-up dental appointment. You analyze the data, and your analysis reveals that older people tend to miss follow-ups more than younger people.

So, you do some research online and discover that people over the age 60 are 50% more likely to miss dentist appointments. Sometimes this is because they’re on a fixed income. Also, many senior citizens lack transportation to get to and from appointments.

With this new knowledge, you write an email to your supervisor expressing your concerns about the dataset. He agrees with your concerns, but he’s also impressed with what you’ve learned and thinks your findings could be very important to the project. He asks you to change the business task. Now, the NDS campaign will be about educating dental offices on the challenges faced by senior citizens and finding ways to help them access quality dental care.

Fill in the blank: Changing the business task involves defining a new _____.

  • gap analysis plan
    • graphical representation of the data
  • question or problem to be solved
  • data-cleaning strategy

Scenario 2 continued

You continue with your analysis. In the end, your findings support what you discovered during your online research: As people get older, they’re less likely to attend follow-up dental visits.

But you’re not done yet. You know that data should be combined with human insights in order to lead to true data-driven decision-making. So, your next step is to share this information with people who are familiar with the problem professionally. They’ll help verify the results of your data analysis.

Fill in the blank: The people who are familiar with a problem and help verify the results of data analysis are _____.

  • customers
    • data scientists
    • stakeholders
  • subject-matter experts

Scenario 2 continued

The subject-matter experts are impressed by your analysis. The team agrees to move to the next step: data visualization. You know it’s important that stakeholders at NDS can quickly and easily understand that older people are less likely to attend important follow-up dental appointments than younger people. This will help them create an effective campaign for members.

It’s time to create your presentation to stakeholders. It will include a data visualization that demonstrates the lifetime trend of people being less likely to attend follow-up appointments as they get older.

Why would a line chart be the most effective in representing this?

  • Line charts are effective in displaying points in series.
  • Line charts arrange data values into rows.
  • Line charts represent data values as proportionally sized wedges.
  • Line charts arrange data values into columns.

Scenario 1, question 1-5

You’ve just started a new job as a data analyst. You’re working for a midsized pharmacy chain with 38 stores in the American Southwest. Your supervisor shares a new data analysis project with you.She explains that the pharmacy is considering discontinuing a bubble bath product called Splashtastic. Your supervisor wants you to analyze sales data and determine what percentage of each store’s total daily sales come from that product. Then, you’ll present your findings to leadership.You know that it's important to follow each step of the data analysis process: ask, prepare, process, analyze, share, and act. So, you begin by defining the problem and making sure you fully understand stakeholder expectations.One of the questions you ask is where to find the dataset you’ll be working with. Your supervisor explains that the company database has all the information you need. Next, you continue to the prepare step. You access the database and write a query to retrieve data about Splashtastic. You notice that there are only 38 rows of data, representing the company’s 38 stores. In addition, your dataset contains five columns: Store Number, Average Daily Customers, Average Daily Splashtastic Sales (Units), Average Daily Splashtastic Sales (Dollars), and Average Total Daily Sales (All Products).

You know that spreadsheets work well for processing and analyzing a small dataset, like the one you’re using. To get the data from the database into a spreadsheet, what should you do?

  • Email a copy of the dataset to your company email address.
    • Use Tableau to convert the data into a spreadsheet.
  • Download the data as a .CSV file, then import it into a spreadsheet.
  • Copy and paste the data into a spreadsheet.

Scenario 1 continued

You’ve downloaded the data from your company database and imported it into a spreadsheet. IMPORTANT: To answer questions using this dataset for the scenario, click the link below and select the “Use Template” button before answering the questions.

Link to template: Course Challenge - Scenario 1

OR

If you don’t have a Google account, you can download the template directly from the attachment below.

Course Challenge Dataset - Scenario 1 - Scenario 1_ Pharmacy Data - Part 1

CSV File

Now, it’s time to process the data. As you know, this step involves finding and eliminating errors and inaccuracies that can get in the way of your results. While cleaning the data, you notice that information about Splashtastic is missing for Store Number 15 in Row 16. Which of the following would be an appropriate response?

  • Sort the spreadsheet so the row with missing data is at the bottom.
  • Ask a colleague on your team how they've handled similar issues in the past.
  • Delete the row with the missing data point.
  • Replace the row with the average values of the other data points.

Scenario 1 continued

Once you’ve found the missing information, you analyze your dataset. During analysis, you create a new column F. At the top of the column, you add the attribute Average Percentage of Total Sales - Splashtastic.

Fill in the blank: An attribute is a _______ or quality of data used to label a column.

  • number
    • headline
    • response
  • characteristic

Scenario 1 continued

Next, you determine the average total daily sales over the past 12 months at all stores. The entire range of cells that contain these sales are E2:E39. Identify the correct way to write your function.

  • =AVERAGE(E2+E39)
    • =AVERAGE(E2,E39)
  • =AVERAGE(E2:E39)
  • =AVERAGE(E2-E39)

Scenario 1 continued

You’ve reached the share phase of the data analysis process. It involves which of the following? Select all that apply.

  • Present your findings about Splashtastic to stakeholders.
  • Prepare a slideshow about Splashtastic’s sales and practice your presentation.
  • Create a data visualization to highlight the Splashtastic sales insights you've discovered.
  • Stop selling Splashtastic because it doesn't represent a large percentage of total sales.

Scenario 2, questions 6-10

You’ve been working for the nonprofit National Dental Society (NDS) as a junior data analyst for about two months. The mission of the NDS is to help its members advance the oral health of their patients. NDS members include dentists, hygienists, and dental office support staff.

The NDS is passionate about patient health. Part of this involves automatically scheduling follow-up appointments after crown replacement, emergency dental surgery, and extraction procedures. NDS believes the follow-up is an important step to ensure patient recovery and minimize infection.

Unfortunately, many patients don’t show up for these appointments, so the NDS wants to create a campaign to help its members learn how to encourage their patients to take follow-up appointments seriously. If successful, this will help the NDS achieve its mission of advancing the oral health of all patients.

Your supervisor has just sent you an email saying that you’re doing very well on the team, and he wants to give you some additional responsibility. He describes the issue of many missed follow-up appointments. You are tasked with analyzing data about this problem and presenting your findings using data visualizations.

An NDS member with three dental offices in Colorado offers to share its data on missed appointments. So, your supervisor uses a database query to access the dataset from the dental group. The query instructs the database to retrieve all patient information from the member’s three dental offices, located in zip code 81137.

The table is dental_data_table, and the column name is zip_code. You have written the following query, but received an error when it ran.

SELECT * FROM dental_data_table WHERE dental_data_table = 81137

Given the objective of the query, where is the mistake in this query?

  • SELECT, FROM, and WHERE should not be capitalized.
    • In line 2, dental_data_table should be replaced with zip_code 81137.
    • The third line should be WHERE = 81137
  • In line 3, dental_data_table should be replaced with zip_code.

Scenario 2 continued

The dataset your supervisor retrieved and imported into a spreadsheet includes a list of patients, their demographic information, dental procedure types, and whether they attended their follow-up appointment. To use the dataset for this scenario, click the link below and select “Use Template.”

Link to template: Course Challenge - Scenario 2

OR

If you don’t have a Google account, you can download the template directly from the attachment below.

Course Challenge Dataset - Scenario 2

CSV File

The patient demographic information includes data such as age, gender, and home address. When examining the geographic data, you notice that all the patients live in the same zip code.

Fill in the blank: The fact that the dataset includes people who all live in the same zip code might get in the way of ______.

  • fairness
  • accuracy
  • spreadsheet formulas or functions
  • data visualization

Scenario 2 continued

As you’re reviewing the dataset, you notice that there are a disproportionate number of senior citizens. So, you investigate further and find out that this zip code represents a rural community in Colorado with about 800 residents. In addition, there’s a large assisted-living facility in the area. Nearly 300 of the residents in the 81137 zip code live in the facility.

You recognize that’s a sizable number, so you want to find out if age has an effect on a patient’s likelihood to attend a follow-up dental appointment. You analyze the data, and your analysis reveals that older people tend to miss follow-ups more than younger people.

So, you do some research online and discover that people over the age 60 are 50% more likely to miss dentist appointments. Sometimes this is because they’re on a fixed income. Also, many senior citizens lack transportation to get to and from appointments.

With this new knowledge, you write an email to your supervisor expressing your concerns about the dataset. He agrees with your concerns, but he’s also impressed with what you’ve learned and thinks your findings could be very important to the project. He asks you to change the business task. Now, the NDS campaign will be about educating dental offices on the challenges faced by senior citizens and finding ways to help them access quality dental care.

The business task has changed. What is the nature of that change?

  • Creating a graphical representation of the data
    • Using a database instead of a spreadsheet
    • Conducting a gap analysis
  • Defining the new question or problem to be solved

Scenario 2 continued

You continue with your analysis. In the end, your findings support what you discovered during your online research: As people get older, they’re less likely to attend follow-up dental visits.

But you’re not done yet. You know that data should be combined with human insights in order to lead to true data-driven decision-making. So, your next step is to share this information with people who are familiar with the problem professionally. They’ll help verify the results of your data analysis.

Fill in the blank: The people who are familiar with a problem and help verify the results of data analysis are _____.

  • stakeholders
  • subject-matter experts
  • customers
  • data scientists

Scenario 2 continued

The subject-matter experts are impressed by your analysis. The team agrees to move to the next step: data visualization. You know it’s important that stakeholders at NDS can quickly and easily understand that older people are less likely to attend important follow-up dental appointments than younger people. This will help them create an effective campaign for members.

It’s time to create your presentation to stakeholders. It will include a data visualization that demonstrates the lifetime trend of people being less likely to attend follow-up appointments as they get older.

Which type of chart will be most effective?

  • A doughnut chart
    • A table
    • A pie chart
  • A line chart

Scenario 1 continued

You’ve downloaded the data from your company database and imported it into a spreadsheet. IMPORTANT: To answer questions using this dataset for the scenario, click the link below and select the “Use Template” button before answering the questions.

Link to template: Course Challenge - Scenario 1

OR

If you don’t have a Google account, you can download the template directly from the attachment below.

Course Challenge Dataset - Scenario 1 - Scenario 1_ Pharmacy Data - Part 1

CSV File

Now, it’s time to process the data. As you know, this step involves finding and eliminating errors and inaccuracies that can get in the way of your results. While cleaning the data, you notice there’s missing data in one of the rows. What might you do to fix this problem? Select all that apply.

  • Ask a colleague on your team how they've handled similar issues in the past
  • Sort the spreadsheet so the row with missing data is at the bottom
    • Delete the row with the missing data point
  • Ask you supervisor for guidance

Scenario 1 continued

Next, you determine the average total daily sales over the past 12 months at all stores. The entire range of cells that contain these sales are E2:E39. To do this, you use a function. You input =AVE(E2:E39), but this returns an error. What is the correct command?

  • =AVERAGE(E2:E39)
  • =AVERAGE(E2+E39)
  • =AVERAGE(E2,E29)
  • =AVERAGE(E2-E39)

Scenario 2, questions 6-10

You’ve been working for the nonprofit National Dental Society (NDS) as a junior data analyst for about two months. The mission of the NDS is to help its members advance the oral health of their patients. NDS members include dentists, hygienists, and dental office support staff.

The NDS is passionate about patient health. Part of this involves automatically scheduling follow-up appointments after crown replacement, emergency dental surgery, and extraction procedures. NDS believes the follow-up is an important step to ensure patient recovery and minimize infection.

Unfortunately, many patients don’t show up for these appointments, so the NDS wants to create a campaign to help its members learn how to encourage their patients to take follow-up appointments seriously. If successful, this will help the NDS achieve its mission of advancing the oral health of all patients.

Your supervisor has just sent you an email saying that you’re doing very well on the team, and he wants to give you some additional responsibility. He describes the issue of many missed follow-up appointments. You are tasked with analyzing data about this problem and presenting your findings using data visualizations.

An NDS member with three dental offices in Colorado offers to share its data on missed appointments. So, your supervisor uses a database query to access the dataset from the dental group. The query instructs the database to retrieve all patient information from the member’s three dental offices, located in zip code 81137.

The table is dental_data_table, and the column name is zip_code. You write the following query.

SELECT * FROM dental_data_table WHERE zip code = 81137

This query is incorrect. How could it be fixed?

  • In line 3, replace zip code with zip_code
  • Decapitalize SELECT, FROM, and WHERE
  • Rewrite line 3 as WHERE_zip code = 81137
  • Rewrite line 3 as zip_code = 81137

Scenario 2 continued

The dataset your supervisor retrieved and imported into a spreadsheet includes a list of patients, their demographic information, dental procedure types, and whether they attended their follow-up appointment. To use the dataset for this scenario, click the link below and select “Use Template.”

Link to template: Course Challenge - Scenario 2

OR

If you don’t have a Google account, you can download the template directly from the attachment below.

Course Challenge Dataset - Scenario 2

CSV File

The patient demographic information includes data such as age, gender, and home address. You review the demographic data, paying particular attention to geography. What geographic aspect of the data may negatively impact fairness?

  • The patients all live in the same city.
    • The patients all live in houses.
    • The patients all live in the same country.
  • The patients all live in the same zip code.

Scenario 2 continued

You continue with your analysis. In the end, your findings support what you discovered during your online research: As people get older, they’re less likely to attend follow-up dental visits.

But you’re not done yet. You know that data should be combined with human insights in order to lead to true data-driven decision-making. So, your next step is to share this information with people who are familiar with the problem professionally. They’ll help verify the results of your data analysis.

Fill in the blank: Subject matter experts are people who are familiar with a problem. They can help by identifying inconsistencies in the analysis, _____, and validating the choices being made.

  • redefining the business problem
  • offering insights into the business problem
  • creating a presentation with the data
  • collecting data relevant to the business problem

Scenario 2 continued

The subject-matter experts are impressed by your analysis. The team agrees to move to the next step: data visualization. You know it’s important that stakeholders at NDS can quickly and easily understand that older people are less likely to attend important follow-up dental appointments than younger people. This will help them create an effective campaign for members.

It’s time to create your presentation to stakeholders. It will include a data visualization that demonstrates the lifetime trend of people being less likely to attend follow-up appointments as they get older.

Fill in the blank: The type of chart that would be most effective in visualizing this is a _____.

  • bar chart
    • pie chart
    • doughnut chart
  • line chart

Course 2 – Ask Questions to Make Data-Driven Decisions

Week 1 – Effective questions

In structured thinking, why would a data analyst organize the available information?

  • To ask SMART questions
  • To recognize the current problem or situation
  • To summarize results using data visualizations
  • To consult with subject matter experts

A local internet service provider is expecting an increase in the number of people streaming online entertainment. Their data analyst uses data to estimate the required bandwidth necessary to service its customers. This is an example of which problem type?

  • Discovering connections
    • Identifying themes
  • Making predictions
  • Spotting something unusual

Fill in the blank: The question, “How could we improve our website to simplify the returns process for our online customers?” is _____-oriented.

  • action
  • bias
  • passive
  • data
  1. Structured thinking involves which of the following processes? Select all that apply.
  • Revealing gaps and opportunities
  • Recognizing the current problem or situation
  • Organizing available information
  • Asking SMART questions
  1. A data analyst creates data visualizations and a slideshow. Which phase of the data analysis process does this describe?
  • Prepare
  • Act
  • Share
  • Process
  1. A recycling center that sponsors a podcast about saving the environment is an example of what strategy?
  • Defining the problem to be solved
  • Making recommendations
  • Staying on budget
  • Trying to reach a target audience
  1. A data analyst is working for a local power company. Recently, many new apartments have been built in the community, so the company wants to determine how much electricity it needs to produce for the new residents in the future. A data analyst uses data to help the company make a more informed forecast. This is an example of which problem type?
  • Spotting something unusual
  • Discovering connections
  • Making predictions
  • Identifying themes
  1. Describe the key difference between the problem types of categorizing things and identifying themes.
  • Categorizing things involves determining how items are different from each other. Identifying themes brings different items back together in a single group.
  • Categorizing things involves assigning grades to items. Identifying themes involves creating new classifications for items.
  • Categorizing things involves taking inventory of items. Identifying themes deals with creating labels for items.
  • Categorizing things involves assigning items to categories. Identifying themes takes those categories a step further, grouping them into broader themes.
  1. Which of the following examples are leading questions? Select all that apply.
  • What do you enjoy most about our service?
  • How did you learn about our company?
  • In what ways did our product meet your needs?
  • How satisfied were you with our customer representative?
  1. The question, “Why don’t our employees complete their timesheets each Friday by noon?” is not action-oriented. Which of the following questions are action-oriented and more likely to lead to change? Select all that apply.
  • What functionalities would make our timesheet web page more user-friendly?
  • What features could we add to our calendar app as a weekly timesheet reminder to employees?
  • How could we simplify the time-keeping process for our employees?
  • Why don’t employees prioritize filling out their timesheets by noon on Fridays?
  1. On a customer service questionnaire, a data analyst asks, “If you could contact our customer service department via chat, how much valuable time would that save you?” Why is this question unfair?
  • It is closed-ended
  • It uses slang words that not everyone can understand
  • It is vague
  • It makes assumptions

Shuffle Q/A

  1. Organizing available information and revealing gaps and opportunities are part of what process?
  • Identifying connections between two or more things
  • Categorizing things
  • Using structured thinking
  • Applying the SMART methodology
  1. The share phase of the data analysis process typically involves which of the following activities? Select all that apply.
  • Summarizing results using data visualizations
  • Communicating findings
  • Creating a slideshow to present to stakeholders
  • Putting analysis into action to solve a problem
  1. A company wants to make more informed decisions regarding next year’s business strategy. An analyst uses data to help identify how things will likely work out in the future. This is an example of which problem type?
  • Making predictions
  • Spotting something unusual
  • Identifying themes
  • Discovering connections
  1. Fill in the blank: Categorizing things involves assigning items to categories, whereas _____ takes those categories a step further, grouping them into broader classifications.
  • Making predictions
  • Finding patterns
  • Discovering connections
  • Identifying themes
  1. Questions that make assumptions often involve concepts that are formed without evidence. An example of this is an idea that is accepted as true without proof.
  • True
  • False
  1. A garden center wants to attract more customers. A data analyst in the marketing department suggests advertising in popular landscaping magazines. This is an example of what practice?
  • Reaching your target audience
  • Collecting customer information
  • Monitoring social media feedback
  • Developing a data analytics case study
  1. Categorizing things involves assigning items to categories. Identifying themes takes those categories a step further, grouping them into broader themes or classifications.
  • True
  • False
  1. Which of the following examples are closed-ended questions? Select all that apply.
  • Is math your favorite subject?
  • What grade did you get on the math test?
  • How old are you?
  • What are your thoughts about math?
  1. The question, “How could we improve our website to simplify the returns process for our online customers?” is action-oriented.
  • True
  • False
  1. Which of the following questions make assumptions? Select all that apply.
  • Keeping employees engaged is important, isn’t it?
  • Wouldn’t you agree that product A is better than product B?
  • Did you get through to customer service?
  • It must be frustrating waiting on hold for so long, right?
  1. Structured thinking involves recognizing the current problem or situation you’re facing and identifying your options.
  • True
  • False
  1. Which of the following examples are leading questions? Select all that apply.
  • How satisfied were you with our customer representative?
  • What do you enjoy most about our service?
  • In what ways did our product meet your needs?
  • How did you learn about our company?
  1. On a customer service questionnaire, a data analyst asks, “If you could contact our customer service department via chat, how much valuable time would that save you?” Why is this question unfair?
  • It is closed-ended
  • It uses slang words that not everyone can understand
  • It is vague
  • It makes assumptions
  1. Fill in the blank: To apply structured thinking, a data analyst should ______ the available information in order to reveal gaps and opportunities and recognize the current problem or situation.
  • organize
  • communicate
  • share
  • record
  1. A national chain of sporting goods stores advertises during popular sporting television broadcasts. This is an example of the company doing what?
  • Reaching its target audience
  • Demonstrating its support for a sports team
  • Defining the problem to be solved
  • Monitoring social feedback
  1. In data analysis, categorizing things involves which of the following?
  • Creating new classifications for items and assigning grades to items
  • Assigning items to categories
  • Taking an inventory of items
  • Determining how items are different from each other
  1. The question, “Why was the Monday afternoon yoga class successful?” is not measurable. Which of the following questions presents a measurable way to learn about the yoga class?
  • Why do people like taking yoga classes on Mondays?
  • How many customers responded to our recent half-price yoga promotion?
  • Is yoga a great way to stretch and strengthen your body?
  • Do yoga instructors seem more energetic at the beginning of the week?
  1. Why should a data analyst only ask fair questions?
  • Unfair questions do not have answers.
  • Unfair questions can provide data that is misleading.
  • Fair questions are biased.
  • Fair questions do not offend people.
  1. In the share step of the data analysis process, a data analyst summarizes their results using data visualizations and creates a slideshow to present to stakeholders. What else might they do in this step?
  • Collect data.
  • Communicate findings.
  • Organize the available information
  • Shred paper files.
  1. If a cooking supply store wants to attract more customers, where can they advertise to better reach their target audience? Select all that apply.
  • On TV during the season finale of The Best Chef in the Universe
  • At a bus stop near a local culinary school
  • On a podcast for foodies
  • In a magazine all about advertising
  1. Making predictions is one of the six data analytics problem types. How does data factor into such problem types?
  • The data informs the predictions.
  • The data confirms the decisions.
  • The data are the predictions.
  • The predictions validate the data.
  1. Which of the following examples are closed-ended questions? Select all that apply.
  • How tall are you?
  • What did you think about the article that I sent you?
  • What is your opinion of the new movie?
  • Have you taken this class before?
  1. What is the defining characteristic of measurable questions?
  • They are questions that have numbers in them.
  • Their answers are numbers that can be interpreted qualitatively.
  • They are questions that use numbers as categories.
  • Their answers are numbers that can be interpreted mathematically.
  1. Fill in the blank: “How many people filled out the survey?” is an example of a question that is _____ in the context of data analysis.
  • categorical
  • symbolic
  • measureable
  • qualitative

Week 2 – Data-driven decisions

An analyst is working with data from two school programs. They discover that the data is measured differently across programs and this may impact how they can work with the data. What does this example describe?

  • Data-inspired decision-making
    • Data-driven decision-making
  • The limitations of working with data
  • Data that cannot be analyzed

A retail store runs a special sale with the goal of increasing sales over the holiday season. They use the increase in sales over the same month last year as a starting point. What type of goal is this an example of?

  • Metric goal
  • Theoretical goal
  • Finite goal
  • Conceptual goal

A data analyst assesses how well their company’s marketing campaign is performing. They apply a formula that compares the cost of the campaign and its net profit. What does this formula measure?

  • The return on investment
  • Total revenue
  • The average cost
  • Total cost
  1. Which of the following statements describes an algorithm?
  • A process or set of rules to be followed for a specific task
  • A method for recognizing the current problem or situation and identifying the options
  • A tool that enables data analysts to spot something unusual
  • A technique for focusing on a single topic or a few closely related ideas
  1. Fill in the blank: If a data analyst is measuring qualities and characteristics, they are considering _____ data.
  • quantitative
  • unbiased
  • cleaned
  • qualitative
  1. In data analytics, reports use live, incoming data from multiple datasets; dashboards use static collections of data.
  • True
  • False
  1. A pivot table is a data-summarization tool used in data processing. Which of the following tasks can pivot tables perform? Select all that apply.
  • Group data
  • Clean data
  • Calculate totals from data
  • Reorganize data
  1. A metric is a single, quantifiable type of data that can be used for what task?
  • Setting and evaluating goals
  • Defining a problem type
  • Cleaning data
  • Sorting and filtering data
  1. Which of the following options describes a metric goal? Select all that apply.
  • Evaluated using metrics
  • Indefinite
  • Measurable
  • Based on theory
  1. Fill in the blank: Return on investment compares the _____ of an investment to the net profit gained from that investment.
  • success
  • purpose
  • cost
  • timing
  1. Fill in the blank: A data analyst is using data to address a large-scale problem. This type of analysis would most likely require _____. Select all that apply.
  • small data
  • data that reflects change over time
  • data represented by a limited number of metrics
  • big data

Shuffle Q/A

  1. Fill in the blank: In data analytics, qualitative data _____. Select all that apply.
  • is always time bound
  • measures qualities and characteristics
  • is subjective
  • measures numerical facts
  1. Fill in the blank: A _____ is a data-summarization tool used to sort, reorganize, group, count, total, or average data.
  • report
  • dashboard
  • function
  • pivot table
  1. Fill in the blank: A _____ goal is measurable and evaluated using single, quantifiable data.
  • metric
  • finite
  • conceptual
  • benchmark
  1. Describe the main differences between big and small data.
  • Small data is typically stored and organized in databases. Big data is typically stored and organized in spreadsheets.
  • Small data is less useful to data analysts. Big data is more useful to data analysts.
  • Small data is specific and concerns a short time period. Big data is less specific and concerns a longer time period.
  • Small data has been cleaned and sorted. Big data has not yet been cleaned or sorted.
  1. In data analytics, a pattern is defined as a process or set of rules to be followed for a specific task.
  • True
  • False
  1. In data analytics, quantitative data measures qualities and characteristics.
  • True
  • False
  1. In data analytics, reports use data that doesn’t change once it’s been recorded. Which of the following terms describes this type of data?
  • Comprehensive
  • Real-time
  • Monitored
  • Static
  1. Which data-summarization tool do data analysts use to sort, reorganize, group, count, total, or average data?
  • A function
  • A pivot table
  • A dashboard
  • A report
  1. A metric is a specific type of data that companies use to identify a problem domain.
  • True
  • False
  1. Fill in the blank: A metric goal is a _____ goal set by a company that is evaluated using metrics.
  • finite
  • theoretical
  • conceptual
  • measurable
  1. A data analyst is using data from a short time period to solve a problem related to someone’s day-to-day decisions. They are most likely working with small data.
  • True
  • False
  1. If a data analyst compares the cost of an investment to the net profit of that investment over a period of time, they’re analyzing the investment scope.
  • True
  • False
  1. What is an example of using a metric? Select all that apply.
  • Using column headers to sort and filter data
  • Using annual profit targets to set and evaluate goals
  • Using key performance indicators, such as click-through rates, to measure revenue
  • Using a pie chart to visualize data
  1. Fill in the blank: In data analytics, a process or set of rules to be followed for a specific task is _____.
  • a pattern
  • a domain
  • an algorithm
  • a value
  1. Fill in the blank: Return on investment compares the cost of an investment to the _____ of that investment.
  • purpose
  • timing
  • net profit
  • future success

Week 3 – More spreadsheet basics

What calculations can you carry out within a spreadsheet? Select all that apply.

  • Minimum
  • Maximum
  • Copying
  • Average

What are some of the ways that data analysts can gather data? Select all that apply.

  • Use data received from a colleague
  • Use data they collect themselves
  • Use data from open source locations
  • Use restricted data from the government

You sum the entries in cells F3 through F200 in your spreadsheet. What is the correct function for this?

  • =SUM(F3+F200)
    • =SUM(F3;F200)
    • =SUM(F3,F200)
  • =SUM(F3:F200)

What are some of the causes of bias in data analytics? Select all that apply.

  • Cultural differences
  • Social norms
  • Multiple perspectives
  • Serving an agenda
  1. Fill in the blank: In spreadsheets, data analysts begin _____ with an equal sign (=).
  • cells
  • numbers
  • formulas
  • charts
  1. Fill in the blank: The labels that describe the type of data contained in each column of a spreadsheet are called _____.
  • assignments
  • attributes
  • allowances
  • aspects
  1. Which of the following tasks might be performed using spreadsheets?
  • Maintain information about accounts
  • Write a sales pitch
  • Develop communication skills
  • Land a new client
  1. Formulas are created by the user, whereas functions are preset commands in spreadsheets.
  • True
  • False
  1. In the function =MAX(B5:B15), what does B5:B15 represent?
  • Observation
  • Column
  • Attribute
  • Range
  1. What is the correct spreadsheet formula for multiplying cell H2 times cell H5?
  • =H2/H5
  • =H2^H5
  • =H2*H5
  • =H2xH5
  1. To avoid bias when collecting data, a data analyst should keep what in mind?
  • Context
  • Opinion
  • Stakeholders
  • Graphs
  1. A data analyst might use descriptive column headers in order to achieve what goal?
  • Add context to their data
  • Protect the spreadsheet
  • Alphabetize the spreadsheet data
  • Filter the data

Shuffle Q/A

  1. To determine an organization’s annual budget, a data analyst might use a slideshow.
  • True
  • False
  1. Which of the following are ways that data analysts can add context to their data? Select all that apply.
  • Use descriptive column headers
  • Consider where the data came from
  • Create reports for stakeholders
  • Ask questions about the data
  1. In spreadsheets, formulas and functions end with an equal sign (=).
  • True
  • False
  1. A data analyst could use spreadsheets to achieve which of the following tasks?
  • Motivate employees
  • Write reports
  • Build code for a new app
  • Predict next quarter’s sales
  1. In the function =MAX(G3:G13), what does G3:G13 represent?
  • an attribute
  • an observation
  • The range
  • a table
  1. What is the correct spreadsheet formula for multiplying cell D5 times cell D7?
  • =D5xD7
  • =D5^D7
  • =D5*D7
  • =D5/D7
  1. Fill in the blank: A data analyst considers which organization created, collected, or funded a dataset in order to understand its _____.
  • structure
  • detail
  • length
  • context
  1. Which of the following statements accurately describe formulas and functions? Select all that apply.
  • Formulas are instructions that perform specific calculations.
  • Formulas may only be used once per spreadsheet column.
  • Functions are preset commands that perform calculations.
  • Formulas and functions assist data analysts in calculations, both simple and complex.
  1. In the function =MAX(B5:B15), what does B5:B15 represent?
  • Attribute
  • Column
  • Observation
  • Range
  1. What is the correct spreadsheet formula for multiplying cell H2 times cell H5?
  • =H2*H5
  • =H2/H5
  • =H2xH5
  • =H2^H5
  1. Both formulas and functions in spreadsheets begin with what symbol?
  • Equal sign (=)
  • Colon (:)
  • Hyphen (-)
  • Bracket ([)
  1. Fill in the blank: By negatively influencing data collection, ____ can have a detrimental effect on analysis.
  • objectivity
  • bias
  • partiality
  • filtering
  1. Attributes are used in spreadsheets for what purpose?
  • Analyze the data in a row
  • Insert data into each column
  • Add a new column
  • Label the data in each column
  1. To determine an organization’s annual budget, a data analyst might use a slideshow.
  • True
  • False
  1. Which of the following statements describes a key difference between formulas and functions?
  • Formulas contain words and numbers, and functions contain numbers only.
  • Formulas span two or more cells, and functions exist in only one cell.
  • Formulas are used in graphs, and functions are not.
  • Formulas are written by the user, and functions are already defined.
  1. What do data analysts use to label the type of data contained in each column in a spreadsheet?
  • Tables
  • Menus
  • Attributes
  • Headings
  1. In the function =MAX(A1:A12), what does A1:A12 represent?
  • The range
  • The operator
  • The maximum
  • The formula
  1. Fill in the blank: Putting data into context helps data analysts eliminate _____.
  • labels
  • intolerance
  • bias
  • fairness

Week 4 – Always remember the stakeholder

Fill in the blank: Your data analytics team is working on a project for the marketing department. The person most likely to be the _____ stakeholder is the vice president of marketing.

  • primary
  • necessary
  • secondary
  • project

To communicate clearly with stakeholders and team members, there are four key questions data analysts ask themselves. One of the questions is: What does my audience already know? Identify the remaining three questions. Select all that apply.

  • What does my audience need to know?
  • How can I communicate effectively to my audience?
  • Why are stakeholders and team members important?
  • Who is my audience?

You accept a new project from a high level stakeholder. After beginning the project, you find that you aren’t sure what you are supposed to do. How do you handle this?

  • Determine the objectives that make the most sense and work towards those.
  • Set up a meeting with the stakeholder to discuss the specific objectives they wanted.
  • Ask a member of your team what was done on the last project and do the same.
  • Perform the standard analysis and present its insights.

A data analyst collects a large amount of data for their project to ensure that the data represents a diverse set of perspectives. What element of data collection does this describe?

  • Sample size
  • Statistical significance
  • Visualization
  • Data cleaning

When leading a meeting, it is important to respect your team members’ time. What are some ways of doing this? Select all that apply.

  • Pay attention to what others are saying
  • Arrive to the meeting on time
  • Discuss work that does not impact the attendees.
  • Be prepared to talk about your work

What are some of the “don’ts” when attending a meeting?

  • Don't dominate the conversation.
  • Don't show up unprepared.
  • Don’t arrive late.
  • Don’t arrive early.

Your manager assigns you a project task, and you don’t understand the point of the project. What questions can you ask them to determine the objective? Select all that apply.

  • What is their end goal?
  • What do you have to do for this task?
    • What is the story they want to tell?
  • What is the big picture?
  1. A data analyst starts a new project for the operations team at their company. They take a few hours at the beginning of the project to identify their stakeholders. The secondary stakeholders are most likely which of the following people? Select all that apply.
  • The data analyst
  • The project manager
  • The president of the company
  • The vice president of operations
  1. A data analyst is researching the buying behavior of people who shop at a company’s retail store and those who might shop there in the future. During the analysis, it will be important to stay in communication with the people who most often interact with these shoppers. They are members of the executive team.
  • True
  • False
  1. There are four key questions data analysts ask themselves: Who is my audience? What do they already know? What do they need to know? And how can I communicate effectively with them? These questions enable data analysts to achieve what goal?
  • Understand who is managing the data
  • Communicate clearly with stakeholders and team members
  • Identify primary and secondary stakeholders
  • Complete data analysis projects on time
  1. Data analysts pay attention to sample size in order to achieve what goals? Select all that apply.
  • To fully understand the scope of the analytics project
  • To avoid a small sample size leading to inaccurate judgements
  • To make sure the data represents a diverse set of perspectives
  • To make sure a few unusual responses don’t skew results
  1. A data analyst receives an email from the vice president of marketing. The vice president is upset because the report they want from the analyst is late. Select the best course of action.
  • The analyst should call the vice president and ask them how important it really is to their marketing efforts.
  • The analyst should send the report immediately, even if it’s not completely finished. This will make the vice president happy.
  • The analyst should respond saying they understand the vice president’s concerns, provide a status update, and let the vice president know when to expect the completed report.
  • The analyst should apologize for the delay and inform the vice president that the marketing managers caused the delay.
  1. Arriving at meetings prepared is an important part of creating a professional work environment. This involves which of the following actions? Select all that apply.
  • Bringing materials to take notes with
  • Considering what questions you may be asked so you’re prepared to answer
  • Reading the meeting agenda ahead of time
  • Bringing a laptop to keep an eye on emails
  1. A data analyst joins an online meeting on time. After reviewing the agenda, they see that their project comes at the very end. They’re extremely busy and can use this time to stay on top of their current projects. How should they proceed?
  • Mute themselves and turn off the camera, then continue working on other tasks until their project is mentioned.
  • Tell the participants that they’re having technical trouble, then leave the meeting to continue working on other tasks.
  • Politely let the presenter know they’re going to leave the meeting and rejoin toward the end.
  • Stay focused and attentive during the entire meeting. Even though some items on the agenda don’t affect their projects, they could still learn something or have something to contribute.
  1. Your data analytics team has been working on a project for a few weeks. You’re almost done, when your supervisor suddenly changes the business task. Everyone has to start all over again. You announce to the team that you’re going to say something to the supervisor about how unreasonable this is. What’s the best next step?
  • Insist that the entire data analytics team complain to your supervisor.
  • Go see your supervisor face-to-face and tell them why you’re so upset.
  • Write a polite, but strongly worded email to your supervisor.
  • Take a few minutes to calm down, then ask your colleagues to share their perspectives so you can work together to determine the best next step.

Shuffle Q/A

  1. A data analyst is researching the buying behavior of people who shop at a company’s retail store and those who might shop there in the future. During the analysis, it will be important to stay in communication with the team that most often interacts with these shoppers. What is the name of this team?
  • Data science team
  • Project management team
  • Executive team
  • Customer-facing team
  1. You receive an angry email from a colleague on the marketing team. The marketing colleague believes you have taken credit for their work. You do not believe this is true. Select the best course of action.
  • Delete the email. It’s best not to create any additional conflict.
  • Reply to the email, asking if they can schedule a time to talk about this in person in order to allow both of you to share your perspectives.
  • Walk over to the marketing colleague’s cubicle, and tell them you strongly disagree.
  • Forward the email to the marketing director with an equally angry note.
  1. A data analyst has been invited to a meeting. They review the agenda and notice that their data analysis project is one of the topics that will be discussed. How can they prepare for an effective meeting? Select all that apply.
  • Bring materials for taking notes.
  • Plan to arrive on time.
  • Think about what project updates they should share.
  • Create and share a revised agenda that includes many more details about their project.
  1. Which of the following steps are key to leading a professional online meeting? Select all that apply.
  • Maintaining control of the meeting by keeping everyone else on mute.
  • Sitting in a quiet area that’s free of distractions
  • Making sure your technology is working properly before starting the meeting
  • Keeping an eye on your inbox during the meeting in case of an important email
  1. A team member has asked you to take on a task, and you don’t understand the point of the project. It seems like it will be a waste of your time. The best course of action would be to politely explain your concerns and decline the project.
  • True
  • False
  1. Fill in the blank: A data analytics team is working on a project to measure the success of a company’s new financial strategy. The vice president of finance is most likely to be the _____.
  • project manager
  • analyst
  • primary stakeholder
  • secondary stakeholder
  1. At an online marketplace, the _____ includes anyone in an organization who interacts with current or potential shoppers.
  • executive team
  • data science team
  • project management team
  • customer-facing team
  1. There are four key questions data analysts ask themselves: Who is my audience? What do they already know? What do they need to know? And how can I communicate effectively with them? These questions enable data analysts to identify the person in charge of managing the data.
  • True
  • False
  1. A data analyst has been invited to a meeting. They review the agenda and notice that their data analysis project is one of the topics that will be discussed. They plan to arrive on time and have a pen and paper to take notes. But they do not spend time considering project updates they could share or questions they may be asked. This is appropriate because they’re not the one running the meeting.
  • True
  • False
  1. A data analytics team is working on a project to measure the success of a company’s new financial strategy. Select the person most likely to be the primary stakeholder for this project.
  • The project manager
  • The data analyst
  • The vice president of finance
  • The director of analytics
  1. To communicate clearly with stakeholders and team members, there are four key questions data analysts ask themselves. One of them is: What does my audience need to know? Identify the remaining three questions. Select all that apply.
  • Why are stakeholders and team members important?
  • Who is my audience?
  • How can I communicate effectively to my audience?
  • What does my audience already know?
  1. Conflict is a natural part of working on a team. What are some ways to help shift a situation from problematic to productive? Select all that apply.
  • Take a moment to check your emotions before engaging in an argument.
  • Ask for a conversation to help you better understand the big picture.
  • Reframe the question by asking, “How can I help?”
  • Identify the person who caused the issue so they can take responsibility.
  1. Data analysts focus on statistical significance to make sure they have enough data so that a few unusual responses don’t skew results
  • True
  • False
  1. A data analyst feels overworked. They often stay late to finish work, and have started missing deadlines. Their supervisor emails them another project to complete, and this causes the analyst even more stress. How should they handle this situation?
  • Accept the new project right away and hope to not miss another deadline.
  • Wait a few minutes to think it over, then respond with a meeting request to discuss this project and the general workload.
  • Walk into the supervisor’s office and tell them to give the project to someone else.
  • Respond immediately, letting the supervisor know the expectations at this company are unreasonable.
  1. When participating in an online meeting, it’s okay to keep your inbox open in another browser window. Participants won’t be distracted because they can’t see it, and you might receive a very important message.
  • True
  • False

Course challenge

Scenario 1, questions 1-5

You’ve just started a job as a data analyst at a small software company that provides data analytics and business intelligence solutions. Your supervisor asks you to kick off a project with a new client, Athena’s Story, a feminist bookstore. They have four existing locations, and the fifth shop has just opened in your community.

Athena’s Story wants to produce a campaign to generate excitement for an upcoming celebration and introduce the bookstore to the community. They share some data with your team to help make the event as successful as possible.

Your task is to review the assignment and the available data, then present your approach to your supervisor. Click the link below to access the email from your supervisor:

Course 2 Scenario 1 Email from Supervisor.pdf

PDF File

Then, review the email, and the Customer Survey and Historical Sales datasets.

To use the templates for the datasets, click the links below and select “Use Template.”

Links to templates: Customer Survey and Historical Sales

OR

If you don't have a Google account, you can download the CSV files directly from the attachments below.

CustomerSurvey - CustomerSurvey

CSV File

HistoricalSales - HistoricalSales

CSV File

After reading the email, you notice that the acronym WHM appears in multiple places. You look it up online, and the most common result is web host manager. That doesn’t seem right to you, as it doesn’t fit the context of a feminist bookstore. Still, you should assume it’s correct and continue with the project.

  • True
  • False

Scenario 1 continued

Now that you know WHM stands for Women’s History Month, you continue reviewing the datasets. You notice that the Customer Survey dataset contains both qualitative and quantitative data.

To use the template for the dataset, click the link below and select “Use Template.”

Link to template: Customer Survey

OR

If you don't have a Google account, you can download the CSV file directly from the attachment below.

CustomerSurvey - CustomerSurvey

CSV File

The qualitative data includes information from which columns? Select all that apply.

  • Column E (Survey Q5: What do you like most about Athena's Story?)
  • Column B (Survey Q2: If answered "Yes" to Q1, how do you plan to celebrate?)
  • Column D (Survey Q4: If answered "Yes" to Q3, how many books do you typically purchase during March?)
  • Column F (Survey Q6: What types of books would you like to see more of at Athena's Story?)

Scenario 1 continued

Next, you review the customer feedback in column F of the Customer Survey dataset.

To use the template for the dataset, click the link below and select “Use Template.”

Link to template: Customer Survey

OR

If you don't have a Google account, you can download the CSV file directly from the attachment below.

CustomerSurvey - CustomerSurvey

CSV File

The attribute of column F is, “Survey Q6: What types of books would you like to see more of at Athena's Story?” In order to verify that children’s literature and feminist zines are among the most popular genres, you create a visualization. This will help you clearly identify which genres are most likely to sell well during the Women’s History Month campaign.

Your visualization looks like this:

Pie chart categories: -Feminist science fiction 4.8% -Books about women 2.4% -Women's journals 2.4% -Feminist literary criticism 2.4% -Children's literature 15.5% -Women's history books 2.4% -Biographies of inspiration 20.2% -Feminist fiction 26.2% -Feminist zines 14.3% -Feminist poetry 4.6% -Feminist novels 3.6%

Pie chart categories: Feminist science fiction 4.8% Books about women 2.4% Women's journals 2.4% Feminist literary criticism 2.4% Children's literature 15.5% Women's history books 2.4% Biographies of inspiration 20.2% Feminist fiction 26.2% Feminist zines 14.3% Feminist poetry 4.6% Feminist novels 3.6%

Fill in the blank: The visualization you create demonstrates the percentages of each book genre that make up the total number of survey responses. It’s called a _____ chart.

  • bubble
  • pie
  • doughnut
  • area

Now that you’ve confirmed that children’s literature and feminist zines are among the most requested book genres, you review the Historical Sales dataset.

To use the template for the dataset, click the link below and select “Use Template.”

Link to template: Historical Sales

If you don't have a Google account, you can download the CSV file directly from the attachment below.

HistoricalSales - HistoricalSales

CSV File

You’re pleased to see that the dataset contains data that’s specific to children’s literature and feminist zines. This will provide you with the information you need to make data-inspired decisions. In addition, the children’s literature and feminist zines metrics will help you organize and analyze the data about each genre in order to determine if they’re likely to be profitable.

Next, you calculate the total sales over 52 weeks for feminist zines. You type =CALCULATE(E2-E53) but get an error. What is the correct syntax?

  • =MAX(E2:E53)
    • =COUNT(E2:E53)
  • =SUM(E2:E53)
  • =CALC(E2:E53)

Scenario 1 continued

After familiarizing yourself with the project and available data, you present your approach to your supervisor. You provide a scope of work, which includes important details, a schedule, and information on how you plan to prepare and validate the data. You also share some of your initial results and the pie chart you created.

In addition, you identify the problem type, or domain, for the data analysis project. You decide that the historical sales data can be used to provide insights into the types of books that will sell best during Women’s History Month this coming year. This will also enable you to determine if Athena’s Story should begin selling more children’s literature and feminist zines.

Using historical data to make informed decisions about how things may be in the future is an example of spotting something unusual.

  • True
  • False

Scenario 2, questions 6-10

You’ve completed this program and are now interviewing for your first junior data analyst position. You’re hoping to be hired by an event planning company, Patel Events Plus. Access the job description below:

Junior Data Analyst Job Description.pdf

PDF File

So far, you’ve successfully completed the first round of interviews with the human resources manager and director of data and strategy. Now, the vice president of data and strategy wants to learn more about your approach to managing projects and clients. Access the email you receive from the human resources director below:

Human Resources Director Email.pdf

PDF File

You arrive Thursday at 1:45 PM for your 2 PM interview. Soon, you’re taken into the office of Mila Aronowicz, vice president of data and strategy. After welcoming you, she begins the behavioral interview.

First, she hands you a copy of Patel Events Plus’s organizational chart. Access the chart below:

Patel Event Plus Org Chart.pdf

PDF File

As you’ve learned in this course, stakeholders are people who invest time, interest, and resources into the projects you’ll be working on as a data analyst. Let’s say you’re working on a project involving data and strategy.

Based on what you find in the organizational chart, who should be considered the primary stakeholder for projects involving data and strategy?

  • Director
    • Project manager
    • Chief executive officer
  • Vice president

Scenario 2 continued

Next, the vice president wants to understand your knowledge about asking effective questions. Consider and respond to the following question. Select all that apply.

Let’s say we just completed a big event for a client and wanted to find out if they were satisfied with their experience. Provide some examples of measurable questions that you could include in the customer feedback survey. Select all that apply.

  • Would you recommend Patel Events Plus to a colleague or friend? Yes or no?
  • Why did you enjoy the event planned by Patel Events Plus?
  • On a scale from 1 to 5, with 1 being not at all likely and 5 being very likely, how likely are you to recommend Patel Events Plus?
  • How would you describe your event experience?

Scenario 2 continued

Now, the vice president presents a situation having to do with resolving challenges and meeting stakeholder expectations. Consider and respond to the following question.

You’re working on a rush project, and you discover your dataset is not clean. Even though it has numerous nulls, redundant data, and other issues, the primary stakeholder insists that you move ahead and use it anyway. The project timeline is so tight that there simply isn’t enough time for cleaning. How would you handle that situation?

  • Contact the stakeholder’s boss to let them know about the issue and ask for help managing the stakeholder’s expectations.
    • The stakeholder is in charge. It's best to do as they say and use the unclean dataset.
    • Clean the data as quickly as you can. It’s not perfect, but it’s better than it was before, and this way you can meet the deadline.
  • Communicate the situation to your supervisor and ask for advice on how to handle the situation with the stakeholder.

Scenario 2 continued

Your next interview question deals with sharing information with stakeholders. Consider and respond to the following question. Select all that apply.

Let’s say you’ve designed a dashboard to give stakeholders easy, automatic access to data about an upcoming event. Describe the benefits of using a dashboard. Select all that apply.

  • Dashboards offer live monitoring of incoming data.
  • Dashboards enable stakeholders to interact with the data.
  • Dashboards are easy to design and understand.
  • Dashboards present pre-cleaned, historical data.

Scenario 2 continued

Your final behavioral interview question involves using metrics to answer business questions. Your interviewer hands you a copy of a Patel Events dataset.

To use the template for this dataset, click the link below and select “Use Template.”

Link to template: Patel Events Data

OR

If you don't have a Google account, you can download the CSV file directly from the attachment below.

Patel Events Plus dataset

CSV File

Then, she asks: Recently, Patel Events Plus purchased a new venue for our events. If we asked you to calculate the return on investment of this purchase, the metrics to consider would be the cost of the investment and what else?

  • Net profit in 2019
  • Average event revenues
  • 2019 events held at new venue
  • Purchase date

Scenario 1, questions 1-5

You’ve just started a job as a data analyst at a small software company that provides data analytics and business intelligence solutions. Your supervisor asks you to kick off a project with a new client, Athena’s Story, a feminist bookstore. They have four existing locations, and the fifth shop has just opened in your community.

Athena’s Story wants to produce a campaign to generate excitement for an upcoming celebration and introduce the bookstore to the community. They share some data with your team to help make the event as successful as possible.

Your task is to review the assignment and the available data, then present your approach to your supervisor. Click the link below to access the email from your supervisor:

Course 2 Scenario 1 Email from Supervisor.pdf

PDF File

Then, review the email, and the Customer Survey and Historical Sales datasets.

To use the templates for the datasets, click the links below and select “Use Template.”

Links to templates: Customer Survey and Historical Sales

OR

If you don't have a Google account, you can download the CSV files directly from the attachments below.

CustomerSurvey - CustomerSurvey

CSV File

HistoricalSales - HistoricalSales

CSV File

After reading the email, you notice that the acronym WHM appears in multiple places. You look it up online, and the most common result is web host manager. That doesn’t seem right to you, as it doesn’t fit the context of a feminist bookstore. You email your supervisor to ask. When writing your email, what do you do to ensure it sounds professional? Select all that apply.

  • Respect your supervisor’s time by writing an email that’s short and to the point.
  • Use a polite greeting and closing.
  • Read your email aloud before sending to catch any typos or grammatical errors and to ensure the communication is clear.
  • Write a clear subject line that gets a fast response so you can keep working: “WHM? NEED TO KNOW WHAT THAT IS RIGHT AWAY.”

Scenario 1 continued

Now that you know WHM stands for Women’s History Month, you continue reviewing the datasets. You notice that the Customer Survey dataset contains both qualitative and quantitative data.

To use the template for the dataset, click the link below and select “Use Template.”

Link to template: Customer Survey

OR

If you don't have a Google account, you can download the CSV file directly from the attachment below.

CustomerSurvey - CustomerSurvey

CSV File

The quantitative data includes information from which columns? Select all that apply.

  • Column D (Survey Q4: If answered "Yes" to Q3, how many books do you typically purchase during March?)
  • Column C (Survey Q3: Do you purchase feminist books in honor of WHM, either for yourself or as a gift for someone else?)
  • Column E (Survey Q5: What do you like most about Athena's Story?)
  • Column A (Survey Q1: Do you plan to celebrate WHM?)

Scenario 2, questions 6-10

You’ve completed this program and are now interviewing for your first junior data analyst position. You’re hoping to be hired by an event planning company, Patel Events Plus. Access the job description below:

Junior Data Analyst Job Description.pdf

PDF File

So far, you’ve successfully completed the first round of interviews with the human resources manager and director of data and strategy. Now, the vice president of data and strategy wants to learn more about your approach to managing projects and clients. Access the email you receive from the human resources director below:

Human Resources Director Email.pdf

PDF File

You arrive Thursday at 1:45 PM for your 2 PM interview. Soon, you’re taken into the office of Mila Aronowicz, vice president of data and strategy. After welcoming you, she begins the behavioral interview.

First, she hands you a copy of Patel Events Plus’s organizational chart. Access the chart below:

Patel Event Plus Org Chart.pdf

PDF File

As you’ve learned in this course, stakeholders are people who invest time, interest, and resources into the projects you’ll be working on as a data analyst. Let’s say you’re working on a project involving data and strategy.

Based on what you find in the organizational chart, which individuals are considered the secondary stakeholders? Select all that apply.

  • Project manager, analytics
  • Data analytics coordinator
  • Director, data analytics
  • Chief executive officer

Scenario 2 continued

Next, the vice president wants to understand your knowledge about asking effective questions. Consider and respond to the following question. Select all that apply.

Let’s say we just completed a big event for a client and wanted to find out if they were satisfied with their experience. Provide some examples of measurable questions that you could include in the customer feedback survey. Select all that apply.

  • How would you rate your overall experience — poor, average, above average, or excellent?
  • Why did our event options and features create a successful event?
  • Was this your first time using Patel Events Plus to plan your event? Yes or no?
  • Did you experience any problems with your event? Yes or no?

Now, the vice president presents a situation having to do with resolving challenges and meeting stakeholder expectations. Consider and respond to the following question. Select all that apply.

You’re working with a dataset that the data analytics coordinator should have cleaned, but it turns out that it wasn’t. Your supervisor thought the dataset was ready for use, but you discover nulls, redundant data, and other issues. The project is due in less than two weeks. Which of the following options would be an appropriate approach? Select all that apply.

  • Proceed with the project using the available data. You don’t want to get the associate data analyst in trouble, and you don’t want to miss your deadline.
    • Email the data analytics coordinator to ask if the two of you can work together to clean the data, as the project is on a tight timeline.
  • Provide your supervisor with a proposed revised timeline. Politely explain that you need some additional time to clean the data.
  • Email your supervisor and the data analytics coordinator to communicate about the issue. Ask if you can meet to come up with a solution.

Scenario 2 continued

Your next interview question deals with sharing information with stakeholders. Consider and respond to the following question. Select all that apply.

Let’s say you’ve created a report to present stakeholders with information about an upcoming event. Describe the benefits of using a report. Select all that apply.

  • Reports enable stakeholders to interact with the data.
    • Reports offer live monitoring of incoming data.
  • Reports reflect data that’s already been cleaned and sorted.
  • Reports provide a snapshot of high-level, historical data.

Course 3 – Prepare Data for Exploration

Week 1 – Data types and structures

A data analyst is preparing an annual report for company executives and decides to use internal data. Why do they choose to use internal data? Select all that apply.

  • Internal data is easier to collect.
  • Internal data is less likely to need cleaning.
  • Internal data is more reliable.
  • Internal data is less vulnerable to biased collection.

A data analyst is reviewing data that has been organized into a table format. What type of data is in the table?

  • Unstructured data
    • Internal data
    • External data
  • Structured data

A data analyst is reviewing a spreadsheet. They find that the columns contain the data variables. What data format does this describe?

  • Tall data
  • Wide data
  • Short data
  • Narrow data
  1. A data analyst at a book publisher is working on an urgent report for executives. They are using only historical data. What is the most likely reason for choosing to analyze only historical data?
  • The project has a very short time frame
  • The data is unknown
  • There is plenty of time to research historical data
  • The data is constantly changing
  1. Which of the following are examples of discrete data? Select all that apply.
  • Box office returns
  • Movie running time
  • Movie budget
  • Number of actors in movie
  1. Which of the following questions collects nominal qualitative data?
  • Is this your first time dining at this restaurant?
  • How many people do you usually dine with?
  • How many times have you dined at this restaurant?
  • On a scale of 1-10, how would you rate your service today?
  1. Why is internal data considered more reliable and easier to collect than external data?
  • Internal data circumvents privacy restrictions.
  • Internal data comes from people you know.
  • Internal data has much larger sample sizes.
  • Internal data lives within a company’s own systems.
  1. A social media post is an example of structured data.
  • True
  • False
  1. Fill in the blank: A Boolean data type can have _____ possible values.
  • three
  • 10
  • two
  • infinite
  1. The following is a selection from a spreadsheet:

What kind of data format does it contain?

  • Short
  • Wide
  • Narrow
  • Long
  1. A data analyst is working in a spreadsheet application. They use Save As to change the file type from .XLS to .CSV. This is an example of a data transformation.
  • True
  • False

Shuffle Q/A

  1. A data analyst is working on an urgent traffic study. As a result of the short time frame, which type of data are they most likely to use?
  • Theoretical
  • Historical
  • Personal
  • Unclean
  1. Nominal qualitative data has a set order or scale.
  • True
  • False
  1. Internal data is more reliable because it’s clean.
  • True
  • False
  1. Structured data is likely to be found in which of the following formats? Select all that apply.
  • Audio file
  • Digital photo
  • Spreadsheet
  • Table
  1. A Boolean data type must have a numeric value.
  • True
  • False
  1. In long data, separate columns contain the values and the context for the values, respectively. What does each column contain in wide data?
  • A specific constraint
  • A specific data type
  • A unique data variable
  • A unique format
  1. Fill in the blank: Data transformation enables data analysts to change the _____ of the data.
  • value
  • structure
  • accuracy
  • meaning
  1. Continuous data is measured and has a limited number of values.
  • True
  • False
  1. Which of the following values are examples of a Boolean data type? Select all that apply.
  • True or false
  • Yes, no, or unsure
  • Yes or no
  • One, two, or three
  1. If you have a short time frame for data collection and need an answer immediately, you likely will have to use historical data.
  • True
  • False
  1. Which of the following is an example of continuous data?
  • Leading actors in movie
  • Box office returns
  • Movie run time
  • Movie budget
  1. Which of the following questions collect nominal qualitative data? Select all that apply.
  • How likely are you to recommend this restaurant to a friend?
  • Is this your first time dining at this restaurant?
  • Have you heard of our frequent diner program?
  • Did anyone recommend our restaurant to you today?
  1. Data transformation can change the structure of the data. An example of this is taking data stored in one format and converting it to another.
  • True
  • False
  1. Which of the following is a benefit of internal data?
  • Internal data is less vulnerable to biased collection.
  • Internal data is the only data relevant to the problem.
  • Internal data is less likely to need cleaning.
  • Internal data is more reliable and easier to collect.

Week 2 – Bias, credibility, privacy, ethics, and access

Which of the following best describes data bias?

  • It is a preference in the data in favor of or against a person, group, or thing.
  • It is a measure of how closely the data represent the population.
  • It refers to how consistent the data is over time as new data is added.
  • It is the tendency for the data to remain accurate for longer.

In data ethics, consent gives an individual the right to know the answers to which of the following questions? Select all that apply.

  • How will my data be used?
  • How long will my data be stored?
  • Why am I being forced to share my data?
  • Why is my data being collected?

An individual who provides their data has the right to know and understand all of the data-processing activities and algorithms used on that data. This concept refers to which aspect of data ethics?

  • Ownership
    • Currency
  • Transaction transparency
  • Consent

A company collects and analyzes user data. As part of this process, they preserve each data subject’s information and activity for all data transactions. What data ethics concept does this describe?

  • Consent
  • Privacy
  • Transformation
  • Transparency
  1. Fill in the blank: A preference in favor of or against a person, group of people, or thing is called _____. It is an error in data analytics that can systematically skew results in a certain direction.
  • data collection
  • data interoperability
  • data bias
  • data anonymization
  1. Which type of bias is the tendency to always construe ambiguous situations in a positive or negative way?
  • Observer
  • Confirmation
  • Sampling
  • Interpretation
  1. Which of the following are qualities of unreliable data? Select all that apply.
  • Biased
  • Inaccurate
  • Vetted
  • Incomplete
  1. Fill in the blank: Data _____ refers to well-founded standards of right and wrong that dictate how data is collected, shared, and used.
  • ethics
  • privacy
  • credibility
  • anonymization
  1. Ownership is a key issue in data ethics. Who owns data?
  • The organization that invests time and money collecting, processing, and analyzing the data
  • The government that passes data-protection legislation
  • The individual who originally generates the data
  • The law enforcement agencies that enforce data protection laws
  1. An employer accesses an employee’s credit report without their consent. This is not a violation of the employee’s privacy because they work at the company.
  • True
  • False
  1. What is the process of protecting people’s private or sensitive data by eliminating identifying information?
  • Data governance
  • Data design
  • Data ethics
  • Data anonymization
  1. A key aspect of open data is free access to people’s personal information.
  • True
  • False

Shuffle Q/A

  1. A clinic surveys a group of male and female patients about their experience with physical therapy. The survey does not include people with disabilities. Is the survey data biased?
  • Yes
  • No
  1. A university surveys its student-athletes about their experience in college sports. The survey only includes student-athletes with scholarships. What type of bias is this an example of?
  • Interpretation bias
  • Observer bias
  • Confirmation bias
  • Sampling bias
  1. An individual who provides their data has the right to know and understand all of the data-processing activities and algorithms used on that data. This is called ownership.
  • True
  • False
  1. The right to inspect, update, or correct your own data is part of which aspect of data ethics?
  • Data openness
  • Data ownership
  • Data consent
  • Data privacy
  1. Interoperability is key to open data’s success. Which of the following is an example of interoperability?
  • A website charges a fee to access a database
  • An analyst removes all personally identifiable information from a database
  • Different databases use common formats and terminology
  • A company restricts the use of a database to its own employees
  1. Which of the following situations are examples of bias? Select all that apply.
  • A researcher who surveys a sample group that is representative of the population
  • A scholar who only reads sources that support their argument
  • A dancing competition judge who is a close friend of the dancer who wins the competition
  • A daycare that won’t hire men for childcare positions
  1. Which of the following “C’s” describe qualities of good data? Select all that apply.
  • Comprehensive
  • Cited
  • Current
  • Consequential
  1. If a company uses your personal data as part of a financial transaction, you should be made aware of the nature and scale of the transaction. What concept of data ethics does this refer to?
  • Privacy
  • Currency
  • Ownership
  • Consent
  1. Data anonymization applies to both text and images.
  • True
  • False
  1. The government of a large city collects data on the quality of the city’s infrastructure. Any business, nonprofit organization, or person can access the government’s databases and re-use or redistribute the data. Is this an example of open data?
  • Yes
  • No
  1. Which of the following are types of data bias often encountered in data analytics? Select all that apply.
  • Observer bias
  • Interpretation bias
  • Educational bias
  • Confirmation bias
  1. In general, the usefulness of data decreases as time passes.
  • True
  • False
  1. Ownership is a key issue in data ethics. Who owns data?
  • The law enforcement agencies that enforce data protection laws
  • The organization that invests time and money collecting, processing, and analyzing the data
  • The individual who originally generates the data
  • The government that passes data-protection legislation
  1. Which of the following are commonly used methods for anonymizing data? Select all that apply.
  • Masking
  • Hashing
  • Deleting
  • Blanking

Week 3 – Databases: Where data lives

Which of the following properties describe primary keys in a relational database? Select all that apply.

  • They are used to ensure data in a specific column is unique.
  • They refer to another primary key in a different table.
    • There can be multiple primary keys in a table.
  • There can only be one primary key in a table.

What do metadata repositories do to make it simpler and quicker to use multiple data sources for analysis? Select all that apply.

  • Keep metadata in a common structure
  • Keep the metadata in an accessible form
  • Store the related data assets
  • Describe where data came from

Which type of metadata is used to indicate where a digital asset or piece of information originated from?

  • Structural
  • Administrative
  • Descriptive
  • General

What is the process that data analysts use to ensure the formal management of their company’s data assets?

  • Data integrity
    • Data aggregation
    • Data mapping
  • Data governance

What are some of the reasons for open data initiatives? Select all that apply.

  • To educate citizens about local issues
  • To make government activities more transparent
  • To increase protection of proprietary data
  • To give people ways to provide feedback to the government

A nonprofit has a list of their many donors. They want to send a mailing to donors who live within 100 miles of the nonprofit’s headquarters. How could they use the column distance_to_hq to only display the donors that meet those conditions?

  • Filter out distances smaller than 50 miles.
  • Filter out distance greater than 100 miles.
  • Sort numerically in ascending order.
  • Sort numerically in descending order.

In the following piece of SQL code, what does the asterisk (*) represent?

SELECT * FROM customers

  • Include all columns.
  • Include all tables.
  • Include the first column.
  • Include specified conditions.

You are working with a database table that contains customer data. The company column lists the company affiliated with each customer. You want to find customers from the company Riotur.

You write the SQL query below.

SELECT * FROM Customer

What code would be added to return only customers affiliated with the company Riotur?

  • company = ‘Riotur’
  • WHERE company = ‘Riotur’
  • JOIN company = ‘Riotur’
  • IN company = ‘Riotur’
  1. Primary and foreign keys are two connected identifiers within separate tables. These tables exist in what kind of database?
  • Metadata
  • Primary
  • Relational
  • Normalized
  1. When working with data from an external source, what can metadata help data analysts do? Select all that apply.
  • Ensure data is clean and reliable
  • Combine data from more than one source
  • Understand the contents of a database
  • Choose which analyses to run

3.Think about data as a student at a high school. In this metaphor, which of the following are examples of metadata? Select all that apply.

  • Student’s ID number
  • Student’s enrollment date
  • Classes the student is enrolled in
  • Grades the student earns
  1. Fill in the blank: Data _____ is the process of ensuring the formal management of a company’s data assets.
  • aggregation
  • integrity
  • mapping
  • governance
  1. In what circumstance might a data analyst choose not to use external data in their analysis?
  • The data represents diverse perspectives
  • The data is too thorough
  • The data is free for anyone to access
  • The data cannot be confirmed to be reliable
  1. A nonprofit maintains a list of how many laptops they provide to each school in the county. In the table, there is a column called number_of_laptops. A data analyst wants to determine which schools were given the fewest laptops. How should they sort the data to return these schools first?
  • Sort alphabetically in ascending order
  • Sort numerically in descending order
  • Sort alphabetically in descending order
  • Sort numerically in ascending order

7.When writing a query, it's necessary for the name of the dataset to be inside two backticks in order for the query to run properly.

  • True
  • False
  1. You are working with a database table that contains customer data. The city column lists the city where each customer is located. You want to find out which customers are located in Berlin.

You write the SQL query below. Add a WHERE clause that will return only customers located in Berlin.

How many customers are located in Berlin?

  • 9
  • 12
  • 2
  • 7

Shuffle Q/A

  1. Relational databases contain a series of tables connected to form relationships. Which two types of fields exist in two connected tables?
  • Star and snowflake schemas
  • Descriptive and structural metadata
  • Internal and external data
  • Primary and foreign keys
  1. Data analysts use metadata for what tasks? Select all that apply.
  • To combine data from more than one source
  • To perform data analyses
  • To interpret the contents of a database
  • To evaluate the quality of data
  1. Think about data as driving a taxi cab. In this metaphor, which of the following are examples of metadata? Select all that apply.
  • Company that owns the taxi
  • License plate number
  • Make and model of the taxi cab
  • Passengers the taxi picks up
  1. Fill in the blank: Data governance is the process of ensuring that a company’s _____ are managed in a formal manner.
  • business tasks
  • data engineers
  • data assets
  • business strategies
  1. What are some key benefits of using external data? Select all that apply.
  • External data is always reliable.
  • External data is free to use.
  • External data has broad reach.
  • External data can provide industry-level perspectives.
  1. A data analyst reviews a national database of movie theater showings. They want to find the first movies shown in San Francisco in 2001. How can they organize the data to return the first 10 movies shown at the top of their list? Select all that apply.
  • Filter out showings not in 2001
  • Sort by date in descending order
  • Sort by date in ascending order
  • Filter out showings outside of San Francisco
  1. You are working with a database table that contains customer data. The state column lists the state where each customer is located. The state names are abbreviated. You want to find out which customers are located in the state of Florida (FL).

You write the SQL query below. Add a WHERE clause that will return only customers located in FL.

How many customers are located in FL?

  • 6
  • 4
  • 1
  • 3
  1. Structural metadata indicates how a piece of data is organized and whether it’s part of one or more than one data collection.
  • True
  • False
  1. Relational databases illustrate relationships between tables. Which fields represent the connection between these tables? Select all that apply.
  • Foreign keys
  • External keys
  • Primary keys
  • Secondary keys
  1. When writing a query, you must remove the two backticks around the name of the dataset in order for the query to run properly.
  • True
  • False
  1. You are working with a database table that contains customer data. The first_name column lists the first name of each customer. You are only interested in customers with the first name Mark.

You write the SQL query below. Add a WHERE clause that will return only customers named Mark.

How many customers are named Mark?

  • 1
  • 5
  • 3
  • 2
  1. Metadata is data about data. What kinds of information can metadata offer about a particular dataset? Select all that apply.
  • How to combine the data with another dataset
  • Which analyses to perform on the data
  • If the data is clean and reliable
  • What kinds of data it contains
  1. A data analyst reviews a database of Wisconsin car sales to find the last car models sold in Milwaukee in 2019. How can they sort and filter the data to return the last five cars sold at the top of their list? Select all that apply.
  • Filter out sales outside of Milwaukee
  • Filter out sales not in 2019
  • Sort by sale date in descending order
  • Sort by sale date in ascending order
  1. When writing a query, the name of the dataset can either be inside two backticks, or not, and the query will still run properly.
  • True
  • False
  1. A data analyst chooses not to use external data because it represents diverse perspectives. This is an appropriate decision when working with external data.
  • True
  • False

Week 4 – Organizing and protecting your data

A data analyst has been tasked with a new project and has started to collect data from multiple sources. The analyst will be working with multiple team members on this project and needs to create a naming convention to allow projects files to be located efficiently. What should the analyst include in each file's name? Select all that apply.

  • Content
  • Collaborators
  • Version number
  • Creation date

Your boss assigns you a new multi-phase project and you create a naming convention for all of your files. With this project lasting years and incorporating multiple analysts it’s crucial that you create data explaining how your naming conventions are structured. What is this data called?

  • Named convention
    • Labeled data
  • Metadata
  • Descriptive data

A data analyst creates a file that lists people who donated to their organization’s fund drive. An effective name for the file is FundDriveDonors_20210216_V01.

  • True
  • False

You have just started a new project and have created a naming convention for all of your files. Once the data has been collected you start foldering. What does the foldering process allow you to do?

  • Organize your files into subfolders
  • Sort your files by name
  • Organize your files in the filing cabinet
  • Organize your files into the cloud

A data analyst deletes an old project’s files from their active project folder. A few months later, they have to review the work that they completed on this project but cannot find the older project files. What should the data analyst have done?

  • Archive the project
  • Email the project
  • Print the project
  • Delete the project

As a data analyst, folder organization is key to being efficient at your job. A common practice is to lay out your folders with broad topics at the top with more specific topics at the bottom. What’s the name of this approach?

  • Bottom to top
    • Left to right
    • Heterarchy
  • Hierarchy

To reduce clutter, a data analyst hides cells that contain long, complex formulas. The hidden cells allow the data analyst to protect their formulas and hide the data from other users with access to the spreadsheet.

  • True
  • False
  1. Fill in the blank: File-naming conventions are _____ that describe a file's content, creation date, or version.
  • general attributes
  • frequent suggestions
  • consistent guidelines
  • common verifications
  1. A data analytics team uses data about data to indicate consistent naming conventions for a project. What type of data is involved in this scenario?
  • Metadata
  • Aggregated data
  • Long data
  • Big data
  1. Data analysts use naming conventions to help them identify or locate a file. Which of the following is an example of an effective file name?
  • Elementary_Students_20090221_V03
  • Sept_ElemtaryStudents_V1
  • ElementarySchoolStudents_EnrollingSeptember2021_PlusRisingMiddleSchool_FJPSKVND
  • Elem_9
  1. Data analysts use a process called encryption to organize folders into subfolders.
  • True
  • False
  1. A data analyst completes a project. They move project files to another location to keep them separate from their current work. This is an example of what process?
  • Duplicating files
  • Destroying files
  • Archiving files
  • Renaming files

6.Data analysts create hierarchies to organize their folders. How are folder hierarchies structured?

  • Broad topics at the top, then more specific topics below
  • Broad topics at the right, then more specific topics at the left
  • Specific topics at the top, then more broad topics below
  • Broad topics at the left, then more specific topics at the right
  1. Using encryption to protect data is an example of what?
  • Data validation
  • Data integrity
  • Data ethics
  • Data security
  1. To reduce clutter, a data analyst hides cells that contain long, complex formulas. To view the formulas again, the analyst will need to adjust the spreadsheet sharing or encryption settings.
  • True
  • False

Shuffle Q/A

  1. A data analyst is working with a file from a customer satisfaction survey. The survey was sent to anyone who became a customer between April and June, 2020. Which of the following is an effective name for the file?
  • April_May_June_2020_Responses_to_New_Customer_Survey_ANALYSISDATA_928310
  • NewCustomerSurvey_2020-6-20_V03
  • Survey_Responses
  • Apr-June2020_CustSurvey_V
  1. Foldering may be used by data analysts to organize folders into what?
  • Databases
  • Subfolders
  • Versions
  • Tables
  1. Data analysts use archiving to separate current from past work. It also cuts down on clutter.
  • True
  • False
  1. Fill in the blank: Data analysts create _____ to structure their folders.
  • scales
  • sequences
  • ladders
  • hierarchies
  1. A data analyst wants to ensure only people on their analytics team can access, edit, and download a spreadsheet. They can use which of the following tools? Select all that apply.
  • Sharing permissions
  • Templates
  • Filtering
  • Encryption
  1. A data analyst wants to share spreadsheet tab A with their team. They’re still working with tabs B and C, and they don’t want their team members to access them yet. Hiding tabs B and C will protect them from being accessed.
  • True
  • False
  1. A data analytics team labels its files to indicate their content, creation date, and version number. The team is using what data organization tool?
  • File-naming verifications
  • File-naming conventions
  • File-naming attributes
  • File-naming references
  1. To align file naming and storage practices, it’s useful to develop metadata practices with your data analytics team.
  • True
  • False
  1. What process do data analysts use to keep project-related files together and organize them into subfolders?
  • Foldering
  • Encrypting
  • Editing
  • Naming
  1. A data analyst completes a project. They move project files to another location to keep them separate from their current work. This is an example of what process?
  • Renaming files
  • Archiving files
  • Destroying files
  • Duplicating files
  1. A data analyst adds sharing permissions to limit who can edit the data contained within a file. This is an example of what?
  • Data validation
  • Data integrity
  • Data security
  • Data ethics
  1. What aspects of a file do file-naming conventions typically describe? Select all that apply.
  • Creation date
  • Content
  • Version number
  • Collaborators
  1. Fill in the blank: A data analytics team uses _____ to indicate consistent naming conventions for a project. This is an example of using data about data.
  • folder hierarchies
  • classifications
  • metadata
  • version control
  1. A data analyst creates a file that lists people who donated to their organization’s fund drive. An effective name for the file is FundDriveDonors_20210216_V01.
  • True
  • False
  1. Data analysts use archiving to separate current from past work. What does this process involve?
  • Using secure data-erase software to destroy old files
  • Reviewing current data files to confirm they’ve been cleaned
  • Moving files from completed projects to another location
  • Reorganizing and renaming current files
  1. Data analysts create hierarchies to organize their folders. They do this by structuring folders by specific topics at the top, then more broadly below.
  • True
  • False

Course challenge

Scenario 1, questions 1-5

You’ve been working at a data analytics consulting company for the past six months. Your team helps restaurants use their data to better understand customer preferences and identify opportunities to become more profitable.

To do this, your team analyzes customer feedback to improve restaurant performance. You use data to help restaurants make better staffing decisions and drive customer loyalty. Your analysis can even track the number of times a customer requests a new dish or ingredient in order to revise restaurant menus.

Currently, you’re working with a vegetarian sandwich restaurant called Garden. The owner wants to make food deliveries more efficient and profitable. To accomplish this goal, your team will use delivery data to better understand when orders leave Garden, when they get to the customer, and overall customer satisfaction with the orders.

Before project kickoff, you attend a discovery session with the vice president of customer experience at Garden. He shares information to help your team better understand the business and project objectives. As a follow-up, he sends you an email with datasets.

Click below to read the email:

C3 Scenario 1_Client Email .pdf

PDF File

And click below to access the datasets:

Course 3 Final Challenge Data Sets - Customer survey data (1)

CSV File

Course 3 Final Challenge Data Sets - Delivery times_distance (1)

CSV File

Reviewing the data enables you to describe how you will use it to achieve your client’s goals. First, you notice that all of the data is first-party data, which means that it was collected from outside sources.

  • True
  • False

Scenario 1 continued

Next, you review the customer satisfaction survey data. To use the template for the customer satisfaction survey data, click the link below and select “Use Template.”

Link to template: Customer Satisfaction Survey data

OR

If you don’t have a Google account, download the CSV file directly from the attachment below.

CustomerSurveyData - Customer survey data

CSV File

You notice that the data in column E is an example of Boolean data. Why did you come to this conclusion?

  • It has each subject in multiple rows.
    • It is qualitative data with a set order or scale.
    • It is organized in a certain format, such as rows and columns.
  • It has only two possible values.

Scenario 1 continued

Now, you review the data on delivery times and the distance of customers from the restaurant.

To use the template for the dataset, click the link below and select “Use Template.”

Link to template: Delivery Times/Distance

OR

If you don’t have a Google account, download the CSV file directly from the attachment below.

DeliveryTimes_DistanceData - Delivery times_distance

CSV File

The data in column D is an example of nominal data.

  • True
  • False

Scenario 2 continued

Consider and respond to the following question. Select all that apply.

Our data analytics team often uses both internal and external data. Describe the difference between the two.

  • External data came from a company’s own systems. Internal data came from the organization.
    • External data is often generated from within the company. Internal data is generated outside the organization.
  • Internal data came from a company’s own systems. External data comes from outside the organization.
  • Internal data is often generated from within the company. External data is generated outside the organization.

Scenario 2 continued

For your final question, your interviewer explains that Sewati Financial Services needs its clients’ trust, and this is an important responsibility for the data analytics team.

He asks you to identify which data analytics practice involves preserving a data subject’s information and activity any time a data transaction occurs.

  • Bias
    • Encryption
  • Data privacy
  • Sharing permissions
  1. Scenario 1, questions 1-5

You’ve been working at a data analytics consulting company for the past six months. Your team helps restaurants use their data to better understand customer preferences and identify opportunities to become more profitable.

To do this, your team analyzes customer feedback to improve restaurant performance. You use data to help restaurants make better staffing decisions and drive customer loyalty. Your analysis can even track the number of times a customer requests a new dish or ingredient in order to revise restaurant menus.

Currently, you’re working with a vegetarian sandwich restaurant called Garden. The owner wants to make food deliveries more efficient and profitable. To accomplish this goal, your team will use delivery data to better understand when orders leave Garden, when they get to the customer, and overall customer satisfaction with the orders.

Before project kickoff, you attend a discovery session with the vice president of customer experience at Garden. He shares information to help your team better understand the business and project objectives. As a follow-up, he sends you an email with datasets.

Reviewing the data enables you to describe how you will use it to achieve your client’s goals. First, you notice that all of the data was collected by Garden employees using their own resources. What type of data does this describe?

  • Third-party data
  • First-party data
  • Nominal data
  • Qualitative data

Scenario 1 continued

The next thing you review is the file containing pictures of sandwich deliveries over a period of 30 days. This is unstructured data, which means what?

  • It’s objective and measures facts.
  • It’s not organized in an easily identifiable manner.
  • It’s organized in a certain format.
  • It’s collected by a group directly from its audience and then sold.
  1. Scenario 1 continued

Next, you review the customer satisfaction survey data. To use the template for the customer satisfaction survey data, click the link below and select “Use Template.”

Link to template: Customer Satisfaction Survey data

OR

If you don’t have a Google account, download the CSV file directly from the attachment below.

The question in column E asks, “Was your order accurate? Please respond yes or no.” The responses listed in column E are an example of Boolean data.

  • True
  • False
  1. Scenario 1 continued

Now, you review the data on delivery times and the distance of customers from the restaurant.

To use the template for the dataset, click the link below and select “Use Template.”

Link to template: Delivery Times/Distance

OR

If you don’t have a Google account, download the CSV file directly from the attachment below.

The data in column E shows the duration of deliveries from Garden to customers. What type of data is this? Select all that apply.

  • Continuous data
  • Quantitative data
  • Qualitative data
  • Discrete data
  1. Scenario 1 continued

The next thing you review is the file containing pictures of sandwich deliveries over a period of 30 days. This is an example of structured data.

  • True
  • False
  1. Scenario 1 continued

Now that you’re familiar with the data, you want to build trust with the team at Garden. You decide to impress them by taking the initiative to reach out to your social media followers. You explain that Garden is a new client, and you show them the pictures of Garden’s sandwich deliveries from the client file. Then, you ask them if they have any photos of sandwich deliveries that you can evaluate.

This is an example of going above and beyond expectations and a great way to build trust.

  • True
  • False
  1. Scenario 2, questions 6-10

You’ve completed this program and are interviewing for a junior data scientist position at a company called Sewati Financial Services.

So far, you’ve successfully completed the first interview with a recruiter. They arrange your second interview with the team at Sewati Financial Services.

You arrive 15 minutes early for your interview. Soon, you are escorted into a conference room, where you meet Kai Harvey, the senior manager of strategy. After welcoming you, he begins the behavioral interview.

Consider and respond to the following question. Select all that apply.

Our data analytics team often surveys clients to get their feedback. If you were on the team, how would you ensure the process does not cause potential bias?

  • Make sure the wording of the survey question does not encourage a specific response from participants.
  • Include clients with disabilities in the survey sample.
  • Give participants enough time to answer each survey question.
  • Instruct participants to share their name and contact information.
  1. Scenario 2 continued

Consider and respond to the following question. Select all that apply.

Our data analytics team often uses external data. Where can you access useful external data?

  • A public database
  • An open-data website
  • Sewati Financial Services database in the cloud
  • Sewati Financial Services website
  1. Scenario 2 continued

Consider and respond to the following question. Select all that apply.

Our analysts often work within the same spreadsheet, but for different purposes. What tools would you use in such a situation?

  • Freeze the header rows
  • Sort the data to make it easier to understand, analyze, and visualize
  • Filter to show only the data that meets a specific criteria
  • Encrypt the spreadsheet so only you can access it
  1. Scenario 2 continued

Next, your interviewer wants to better understand your knowledge of basic SQL commands. He asks: How would you write a query that retrieves only data about people who work in Boise from the Clients table in our database?

  1. Scenario 2 continued

For your final question, your interviewer explains that Sewati Financial Services cares about data privacy. The company needs its clients’ trust, and this is an important responsibility for the data analytics team.

He asks: What does data privacy involve? Select all that apply.

  • Encryption and sharing permissions
  • Preserving a data subject’s information and activity any time a data transaction occurs
  • Putting privacy measures in place to protect people’s data
  • A person’s legal right to their data

Shuffle Q/A

  1. Scenario 1, questions 1-5

You’ve been working at a data analytics consulting company for the past six months. Your team helps restaurants use their data to better understand customer preferences and identify opportunities to become more profitable.

To do this, your team analyzes customer feedback to improve restaurant performance. You use data to help restaurants make better staffing decisions and drive customer loyalty. Your analysis can even track the number of times a customer requests a new dish or ingredient in order to revise restaurant menus.

Currently, you’re working with a vegetarian sandwich restaurant called Garden. The owner wants to make food deliveries more efficient and profitable. To accomplish this goal, your team will use delivery data to better understand when orders leave Garden, when they get to the customer, and overall customer satisfaction with the orders.

Before project kickoff, you attend a discovery session with the vice president of customer experience at Garden. He shares information to help your team better understand the business and project objectives. As a follow-up, he sends you an email with datasets.

Click below to read the email:

And click below to access the datasets:

Reviewing the data enables you to describe how you will use it to achieve your client’s goals. First, you notice that all of the data is first-party data. What does this mean?

  • It’s data that was collected from outside sources.
  • It’s data that was collected by Garden employees using the company’s own resources.
  • It’s a type of data that’s categorized without a set order.
  • It’s subjective data that measures qualities and characteristics.
  1. Scenario 1 continued

Next, you review the customer satisfaction survey data. To use the template for the customer satisfaction survey data, click the link below and select “Use Template.”

Link to template: Customer Satisfaction Survey data

OR

If you don’t have a Google account, download the CSV file directly from the attachment below.

You notice that the data in column E is an example of Boolean data. Why did you come to this conclusion?

  • It has each subject in multiple rows.
  • It is qualitative data with a set order or scale.
  • It has only two possible values.
  • It is organized in a certain format, such as rows and columns.
  1. Scenario 1 continued

Now, you review the data on delivery times and the distance of customers from the restaurant.

To use the template for the dataset, click the link below and select “Use Template.”

Link to template: Delivery Times/Distance

OR

If you don’t have a Google account, download the CSV file directly from the attachment below.

Fill in the blank: The data in column E is an example of _____ data. Select all that apply.

  • continuous
  • qualitative
  • discrete
  • quantitative
  1. Scenario 1 continued

The next thing you review is the file containing pictures of sandwich deliveries over a period of 30 days. What type of data is this?

  • Ordinal
  • Unstructured
  • Discrete
  • Relational
  1. Scenario 1 continued

Now that you’re familiar with the data, you want to build trust with the team at Garden.

What actions should you take when working with their data? Select all that apply.

  • Keep the data safe by implementing data-security measures, such as password protection and user permissions.
  • Share the client’s data with other delivery restaurants to compare performance.
  • Post on social media that you’re working with Garden and would like feedback from any of your contacts who have ordered there before.
  • Organize the data using effective naming conventions.
  1. Scenario 2, questions 6-10

You’ve completed this program and are interviewing for a junior data scientist position at a company called Sewati Financial Services.

Click below to review the job description:

So far, you’ve successfully completed the first interview with a recruiter. They arrange your second interview with the team at Sewati Financial Services.

Click below to read the email from the human resources director:

You arrive 15 minutes early for your interview. Soon, you are escorted into a conference room, where you meet Kai Harvey, the senior manager of strategy. After welcoming you, he begins the behavioral interview.

Consider and respond to the following question. Select all that apply.

Our data analytics team often surveys clients to get their feedback. If you were on the team, how would you ensure the sample is representative of the population as a whole?

  • Only include participants who can answer survey questions in a timely manner.
  • Make sure the sample is chosen at random.
  • Include clients with disabilities in the survey sample.
  • Use a randomized sample of the population that includes all genders.
  1. Scenario 2 continued

Next, your interviewer wants to better understand your knowledge of basic SQL commands. He asks: How would you write a query that retrieves only data about people who joined our firm in 2019 from the Clients table in our database?

  1. Scenario 2 continued

For your final question, your interviewer explains that Sewati Financial Services cares about its clients’ trust, and this is an important responsibility for the data analytics team. They do this by:

protecting clients from unauthorized access to their private data

ensuring freedom from inappropriate use of client data

getting consent to use someone’s data

He asks: Which data analytics practice does this describe?

  • Encryption
  • Bias
  • Data privacy
  • Sharing permissions
  1. Scenario 1 continued

Next, you review the customer satisfaction survey data. To use the template for the customer satisfaction survey data, click the link below and select “Use Template.”

Link to template: Customer Satisfaction Survey data

OR

If you don’t have a Google account, download the CSV file directly from the attachment below.

The question in column E asks, “Was your order accurate? Please respond yes or no.” What kind of data is this?

  • Second-party data
  • Ordinal data
  • Clean data
  • Boolean data
  1. Scenario 1 continued

The next thing you review is the file containing pictures of sandwich deliveries over a period of 30 days. This is unstructured data, which means what?

  • It’s organized in a certain format.
  • It’s not organized in an easily identifiable manner.
  • It’s objective and measures facts.
  • It’s collected by a group directly from its audience and then sold.
  1. Scenario 2 continued

Consider and respond to the following question. Select all that apply.

Our data analytics team often uses external data. Where can you locate useful external data?

  • Other financial businesses
  • Sewati Financial Solutions marketing department
  • Government sources
  • A professional finance association
  1. Scenario 2 continued

Consider and respond to the following question.

Our analysts often work within the same spreadsheet, but for different purposes. How could filtering help in this situation?

  • Filtering enables you to highlight the header row
  • Filtering enables you to sort the data in a meaningful order
  • Filtering simplifies a spreadsheet by only showing you the information you need
  • Filtering encrypts the spreadsheet so only you can access it

Course 4 – Process Data from Dirty to Clean

Week 1 – The importance of integrity

Fill in the blank: As a data analyst, you need to verify that your data is _____ to ensure your analysis and conclusions are accurate.

  • complete and valid
  • private and valid
  • manipulated and replicated
  • manipulated and valid

A data analyst is given a dataset for analysis. It includes data only about the total population of every country in the previous 20 years. Based on the available data, an analyst would have the full picture and be able to determine the reasons behind a certain country's population increase from 2016 to 2017.

  • True
  • False

A data analyst is given a dataset for analysis. To use the template for this dataset, click the link below and select “Use Template.”

Link to template: June 2014 Invoices

OR

If you don’t have a Google account, download the CSV file directly from the attachment below.

June 2014 Invoices - Sheet1

June 2014 Invoices - Sheet1

CSV File

The analyst notices a limitation with the data in rows 8 and 9. What is the limitation?

  • Row 8 and row 9 show the wrong currency.
    • Row 9 needs more data.
  • Row 9 is a duplicate of row 8.
  • Row 8 is not in the correct format.

A data analyst is working on a project about the global supply chain. They have a dataset with lots of relevant data from Europe and Asia. However, they decide to generate new data that represents all continents. What type of insufficient data does this scenario describe?

  • Data from only one source
    • Data that’s outdated
    • Data that keeps updating
  • Data that’s geographically limited

In the data analysis process, how does a sample relate to a population?

  • A sample is a duplicate selection of data that is taken from the population.
    • A sample is an average of all the data that represents the population.
    • A sample is an ideal example taken from a population.
  • A sample is a part of a population that is representative of the population.

A restaurant wants to gather data about a new dish by giving out free samples and asking for feedback. Who should the restaurant give samples to?

  • Diners who spend the most money on their meal
  • All diners
  • Selecting diners at random
  • Diners who are willing to pay for the samples
  1. Fill in the blank: If a data analyst is using data that has been _____, the data will lack integrity and the analysis will be faulty.
  • wide
  • compromised
  • public
  • clean
  1. A financial analyst imports a dataset to their computer from a storage device. As it’s being imported, the connection is interrupted, which compromises the data. Which of the following processes caused the compromise?
  • Data analysis
  • Data gathering
  • Data manipulation
  • Data transfer
  1. A data analyst is given a dataset for analysis. It includes data about the total population of every country in the previous 20 years. Based on the available data, an analyst would be able to determine the reasons behind a certain country's population increase from 2016 to 2017.
  • True
  • False
  1. A data analyst is given a dataset for analysis. To use the template for this dataset, click the link below and select “Use Template.”

Link to template: June 2014 Invoices

OR

If you don’t have a Google account, download the CSV file directly from the attachment below.

Which of the following has duplicate data?

  • Data for Symteco on 2/21/2014
  • Data for Symteco on 5/20/2014
  • Data for Valando on 2/18/2014
  • Data for Valando on 1/1/2014
  1. A data analyst at a nonprofit organization is working with a dataset about a summer fundraiser. Although they have a lot of useful data by the end of the month, they recognize that the data is insufficient. So, they decide to wait until the end of the season to begin working with the dataset. Which type of insufficient data does this example describe?
  • Outdated data
  • Data from only one source
  • Geographically limited data
  • Data that keeps updating
  1. When gathering data through a survey, companies can save money by surveying 100% of a population.
  • True
  • False

7.Fill in the blank: Sampling bias in data collection happens when a sample isn’t representative of _____.

  • the population as a whole
  • a dataset about the population
  • a subset of the population
  • the population most affected by the data
  1. Data and business objectives might not align for a number of reasons. Which of the following issues can prevent alignment? Select all that apply.
  • Sampling bias
  • Data integrity
  • Data visualization
  • Insufficient data

Shuffle Q/A

  1. Which of the following conditions are necessary to ensure data integrity? Select all that apply.
  • Privacy
  • Completeness
  • Statistical power
  • Accuracy
  1. What is one potential problem associated with data manipulation that analysts must be aware of?
  • Data manipulation can separate a dataset among different locations.
  • Data manipulation can help organize a dataset.
  • Data manipulation can introduce errors.
  • Data manipulation can make a dataset easier to read.
  1. As a data analyst, you are working for a national pizza restaurant chain. You have a dataset with monthly order totals for each branch over the past year. With only this data, what questions can you answer?
  • Which region had the highest sales over the last two years?
  • Which branch will be the most profitable over the next year?
  • What was the most popular item on the menu?
  • Which branch had the most orders in the last month of last year?
  1. A data analyst is given a dataset for analysis. To use the template for this dataset, click the link below and select “Use Template.”

Link to template: June 2014 Invoices

OR

If you don’t have a Google account, download the CSV file directly from the attachment below.

June 2014 Invoices - Sheet1

The data analyst is asked to find the average estimate for Symteco over the past three years. What limitation of the data makes this impossible?

  • The data uses the wrong currency.
  • The data is all from a single year.
  • The data does not include Symteco.
  • The data does not include estimates.
  1. A data analyst at a software company wants to learn more about industry competitors. Because the software industry has more mergers than any other field, the companies and their products are constantly evolving. The analyst has a dataset from three years ago, and they notice that many of the companies and products in the dataset have changed. What makes the analyst decide that the data is insufficient, so they should generate fresh data instead?
  • It is outdated data.
  • It is geographically limited data.
  • It is data that keeps updating.
  • It is data from only one source.
  1. A restaurant gathers data about a new dish by providing free samples to parties of six or more diners. What does this scenario describe?
  • Random sampling
  • Unbiased sampling
  • Geographically limited sampling
  • Sampling bias
  1. Which of the following processes helps ensure a close alignment of data and business objectives?
  • Completing data replication
  • Transferring data multiple times
  • Maintaining data integrity
  • Having data update automatically during analysis
  1. What can jeopardize data integrity throughout its lifecycle? Select all that apply.
  • Insufficient data
  • Human error
  • Malware
  • System failures
  1. A healthcare company keeps copies of their data at several locations across the country. The data becomes compromised because each location creates a copy of the original at different times of day. Which of the following processes caused the compromise?
  • Data gathering
  • Data manipulation
  • Data transfer
  • Data replication
  1. A data analyst is given a dataset for analysis. It includes data about the total population of every country in the previous 20 years. Which of the following questions would the analyst need more data to address?
  • Which country had the smallest population in 2017?
  • Which country had the greatest population in 2015?
  • What was the reason for the population increase in a certain country?
  • What was the population of a certain country in 2020?
  1. A data analyst is given a dataset for analysis. To use the template for this dataset, click the link below and select “Use Template.”

Link to template: June 2014 Invoices

OR

If you don’t have a Google account, download the CSV file directly from the attachment below.

June 2014 Invoices - Sheet1

Which of the following are limitations of this dataset?

  • Identifying the most profitable clients between January and November of 2014
  • Identifying the least profitable clients between January and November of 2014
  • Identifying the worst paying client between March and December of 2014
  • Identifying the best paying client between January and November of 2014
  1. A car manufacturer wants to learn more about the brand preferences of electric car owners. There are millions of electric car owners in the world. Who should the company survey?
  • A sample of all electric car owners
  • The entire population of electric car owners
  • A sample of car owners who have owned more than one electric car
  • A sample of car owners who most recently bought an electric car
  1. A candy manufacturer finds an even distribution of sales across all age ranges of customers who purchase their products. The manufacturer decides to conduct a survey to learn more about its customer base. Due to age requirements, they can only send the survey to customers who are 21 years or older. This scenario can be described as what?
  • Down sampling bias
  • Sampling bias
  • Unbiased sampling
  • Upsampling bias
  1. What best describes a sample size?
  • A subset of the population between the 25th and 50th percentile
  • A random subset of the population
  • A subset that is representative of the population as a whole
  • A subset of the population excluding outliers
  1. Fill in the blank: In order to have a strong and thorough analysis, a data analyst must verify _____.
  • data replication
  • data manipulation
  • data engineering
  • data integrity
  1. Fill in the blank: _____ is the process of changing data to make it more organized and easier to read.
  • Data transfer
  • Data manipulation
  • Data gathering
  • Data replication
  1. You are working for a global technology company. You have a dataset with the company’s total cell phone sales by country from 2015 to present. Based on the data you have, what questions are you able to answer?
  • What was the effect on sales when a new phone model was launched?
  • What was the effect on sales when new phone features were introduced?
  • What countries have the most cell phone sales in the past three years?
  • What are the mean cell phone sales for each country since 2010?
  1. A data analyst, working for a publishing company, gathers a dataset which includes all books sold in the United Kingdom over the last three years. However, they decide to generate new data that represents global book sales. What type of insufficient data does this scenario describe?
  • Data that keeps updating
  • Data that is outdated
  • Data that is geographically limited
  • Data from only one source
  1. A company is trying to learn more about their customer base. They would like to conduct a survey to understand why their customers chose their brand. How should the company survey its customers?
  • Conduct a survey of customers who purchased a different brand
  • Conduct a survey of customers that live in high-income areas
  • Conduct a survey with a representative sample of their customer population
  • Conduct a survey with customers who have purchased more than five products
  1. Sometimes during analysis, an analyst discovers that it’s necessary to adjust the business objective. When this happens, the analyst should take the initiative to do so without involving others in order to be respectful of their time.
  • True
  • False
  1. A car dealership gathers data about their entire customer population. They decide to conduct a survey to understand why their customers chose their dealership. They send out an email to all customers who have purchased more than two vehicles in the past five years. What does this scenario describe?
  • Unbiased sampling
  • Geographically limited sampling
  • Random sampling
  • Sampling bias
  1. A data analyst needs to migrate data from a server located at their company's headquarters to a remote site. This can lead to what type of data integrity issue?
  • Data replication
  • Data cleaning
  • Data transfer
  • Data manipulation
  1. As a data analyst, you work with data about the life expectancy of sea turtles in the Coral Triangle. The dataset contains an estimated birthdate and deathdate for all tracked sea turtles. With the data you have, what questions are you able to answer?
  • What is the median age a sea turtle has lived in the Coral Triangle?
  • Where is the most prevalent location sea turtles are being hatched in the Coral Triangle?
  • What is the largest sea turtle ever recorded?
  • Is the sea turtle population increasing throughout the world?
  1. A clothing manufacturer wants to learn more about why their consumers have purchased the brand’s products. How should this manufacturer conduct their survey?
  • Send the survey to a representative sample of their customers
  • Send the survey to customers who have purchased more than one product
  • Send the survey to their least frequent customers
  • Send the survey to random people who buy clothes

Week 2 – Sparkling-clean data

Fill in the blank: Conditional formatting is a spreadsheet tool that changes how _____ appear when values meet a specific condition.

  • charts
    • filters
  • cells
  • queries

For a function to work properly, data analysts must follow each function’s predetermined structure. What is this structure called?

  • Validation
  • Syntax
  • Algorithm
  • Summary

An analyst is cleaning a new dataset. They want to make sure the data contained from cell B2 through cell B100 does not contain a number smaller than 10. Which COUNTIF function syntax can be used to answer this question?

  • =COUNTIF(B2:B100,"<9")
  • =COUNTIF(B2:B100,”>=10”)
  • =COUNTIF(B2:B100,>50)
  • =COUNTIF(B2:B200, ”<=50”)

VLOOKUP searches for a value in a row in order to return a corresponding piece of information.

  • True
  • False

To evaluate how well two or more data sources work together, data analysts use data mapping.

  • True
  • False
  1. As part of the data-cleaning process, a data analyst creates a rule to highlight any empty cells in a bright blue color. This is an example of data visualization.
  • True
  • False

2.A data analyst at a nonprofit organization is working with the following spreadsheet, which contains member name data in column C. They want to divide this data using the underscore as a delimiter, so that first names are stored in one column and last names in another. Which tool should the analyst use?

  • Conditional formatting
  • Pivot table
  • SPLIT function
  • MID function

3.Fill in the blank: When describing a SUM function, the _____ is =SUM(value 1 through value 2).

  • syntax
  • standard
  • structure
  • script

4.You are working with the following selection of a spreadsheet:

In order to extract the five-digit postal code from Burlington, MA, what is the correct function?

  • =RIGHT(B3,5)
  • =RIGHT(5,B3)
  • =LEFT(5,B3)
  • =LEFT(B3,5)
  1. A data analyst in a human resources department is working with the following selection of a spreadsheet:

They want to create employee identification numbers (IDs) in column D. The IDs should include the year hired plus the last four digits of the employee’s Social Security Number (SS#). What function will create the ID 20093208 for the employee in row 5?

  • =CONCATENATE(A5!B5)
  • =CONCATENATE(A5*B5)
  • =CONCATENATE(A5+B5)
  • =CONCATENATE(A5,B5)
  1. A data analyst at an e-commerce company is working with a spreadsheet containing last month's sales. The most expensive product their company sells costs $49.99, so they want to quickly confirm that all of the data in the Sales column is $49.99 or less. What function can they use?
  • SUMIF
  • COUNTIF
  • COUNT
  • SUM
  1. A data analyst wants to search for a certain value in a column, then return a corresponding piece of information. Which function should they use?
  • VALUE
  • VLOOKUP
  • MATCH
  • FIND
  1. A data analyst needs to combine two datasets. Each dataset comes from a different system, and the systems store data in different ways. What can the data analyst do to ensure the data is compatible?
  • Use a data visualization
  • Map the data
  • Apply a data structure
  • Merge the data

Shuffle Q/A

  1. In their spreadsheet, a data analyst makes cells stand out for more efficient analysis. What spreadsheet tool is used to do this?
  • Cell filtering
  • Conditional ranking
  • Conditional formatting
  • Cell querying
  1. A data analyst uses the SPLIT function to divide a text string around a specified character and put each fragment into a new, separate cell. What is the specified character separating each item called?
  • Unit
  • Delimiter
  • Partition
  • Substring
  1. A data analyst is using a function in a spreadsheet. For the function to work correctly, they follow the function’s syntax. What does this entail?
  • It is the function’s name and placement.
  • It is how the function can be used in a program.
  • It is the function’s required information and its proper placement.
  • It is the purpose of the function and its use.
  1. In a spreadsheet, what is the correct function for extracting the first two characters of the string located in cell A7?
  • =LEFT(A7,2)
  • =LEFT(2,A7)
  • =RIGHT(A7,2)
  • =RIGHT(2,A7)
  1. Fill in the blank: In a spreadsheet, the function VLOOKUP is used to _____ information in a column based on a specified data value.
  • return
  • replace
  • transform
  • delete
  1. What describes syntax?
  • It is the function’s required information and its proper placement.
  • It is how the function can be used in a program.
  • It is the purpose of the function and its use.
  • It is the function’s name and placement.
  1. A data analyst in a human resources department is working with the following selection of a spreadsheet:

They want to create employee identification numbers (IDs) in column D. The IDs should include the last four digits of the employee’s Social Security Number(SS#) plus the year hired. What function will create the ID 19392020 for the employee in row 4?

  • =CONCATENATE(B4+A4)
  • =CONCATENATE(B4,A4)
  • =CONCATENATE(A4+B4)
  • =CONCATENATE(A4!B4)
  1. An analyst is cleaning a new dataset. They want to determine how many of the cells in column F have a value of 0. However, they only want rows 7 to 120 to be considered. Which COUNTIF function syntax can be used to answer this question?
  • =COUNTIF(F2:F1250, 0)
  • =COUNTIF(F7:F120, =0)
  • =COUNTIF(F7:F120,”0″)
  • =COUNTIF(F7:F120,”=0”)
  1. A data analyst needs to combine two datasets. Each dataset comes from a different system, and the systems store data in different ways. What can the data analyst do to ensure the data is compatible prior to analyzing the data?
  • Use a data visualization
  • Map the data
  • Spot check for null values
  • Apply a data structure
  1. A data analyst is working on a spreadsheet in which one of the columns contains name data. This data is formatted as lastname_firstname. The analyst splits this data at the underscore so that each piece—firstname and lastname—are contained in their own columns.

In this context, what is the underscore acting as?

  • Partition
  • Delimiter
  • Substring
  • MID function
  1. A data analyst is using a function in a spreadsheet. When they input the function, they follow a predetermined structure that includes all required information for the function and its proper placement. What aspect of a function does this describe?
  • The specified value of the function
  • The syntax of the function
  • The length of the function
  • The number of characters in the function
  1. You are working with the following selection of a spreadsheet:

In order to extract the five-digit postal code from Brandon, FL, what is the correct function?

  • =RIGHT(5,B4)
  • =RIGHT(B4,5)
  • =LEFT(B4,5)
  • =LEFT(5,B4)
  1. A data analyst in a human resources department is working with the following selection of a spreadsheet:

They want to create employee identification numbers (IDs) in column D. The IDs should include the last four digits of the employee’s Social Security Number(SS#) plus the year hired. What function will create the ID 32082009 for the employee in row 5?

  • =CONCATENATE(B5,A5)
  • =CONCATENATE(A5!B5)
  • =CONCATENATE(A5+B5)
  • =CONCATENATE(B5+A5)
  1. Before analyzing a dataset, an analyst maps the data. What is the reason for doing this?
  • The analyst wants to know what attributes the data has.
  • The analyst thinks the dataset might have some null values.
  • The dataset has no visualizations.
  • The dataset contains data from different sources.
  1. A data analyst suspects that there are many blank cells in their spreadsheet corresponding to missing information. What spreadsheet tool can they use to identify only those cells containing the null values?
  • Conditional ranking
  • Conditional formatting
  • Cell querying
  • Cell filtering
  1. A data analyst is working on a spreadsheet in which one of the columns is name data. This data is formatted as lastname, firstname. The analyst chooses to divide this data into two new columns, one containing the firstname data and the other containing the lastname data. What spreadsheet tool would they use to do this?
  • The MID function
  • The SPLIT function
  • Substring formatting
  • Conditional formatting
  1. Fill in the blank: The function _____ is used to return information in a column that contains a specified value.
  • VALUE
  • MATCH
  • VLOOKUP
  • FIND
  1. In a spreadsheet, what function would you use to extract the last three characters of the string located in row 4, column C?
  • =RIGHT(3,C4)
  • =LEFT(C4,3)
  • =LEFT(3,C4)
  • =RIGHT(C4,3)

Week 3 – Cleaning data with SQL

In which of the following situations would a data analyst use spreadsheets instead of SQL? Select all that apply.

  • When working with a small dataset
  • When visually inspecting data
  • When using a language to interact with multiple database programs
  • When working with a dataset with more than 1,000,000 rows

In SQL databases, what data type is the value 78.99 an example of?

  • Integer
    • String
    • Boolean
  • Float
  1. Fill in the blank: Data analysts usually use _____ to deal with very large datasets.
  • web browsers
  • spreadsheets
  • SQL
  • word processors

2.What are some of the benefits of using SQL for analysis? Select all that apply.

  • SQL interacts with database programs.
  • SQL tracks changes across a team.
  • SQL has built-in functionalities.
  • SQL can pull information from different database sources.
  1. A data analyst creates many new tables in their company’s database. When the project is complete, the analyst wants to remove the tables so they don’t clutter the database. What SQL commands can they use to delete the tables?
  • CREATE TABLE IF NOT EXISTS
  • DROP TABLE IF EXISTS
  • UPDATE
  • INSERT INTO
  1. You are working with a database table that contains invoice data. The table includes columns for invoice_id and customer_id. You want to remove duplicate entries for customer ID and sort the results by invoice ID.

You write the SQL query below. Add a DISTINCT clause that will remove duplicate entries from the customer_id column.

NOTE: The three dots (...) indicate where to add the clause.

What customer ID number appears in row 12 of your query result?

  • 23
  • 42
  • 16
  • 8
  1. You are working with a database table that contains customer data. The table includes columns about customer location such as city, state, country, and postal_code. You want to check for postal codes that are greater than 7 characters long.

You write the SQL query below. Add a LENGTH function that will return any postal codes that are greater than 7 characters long.

What is the last name of the customer that appears in row 10 of your query result?

  • Rocha
  • Brooks
  • Hughes
  • Ramos
  1. A data analyst is cleaning transportation data for a ride-share company. The analyst converts the data on ride duration from text strings to floats. What does this scenario describe?
  • Visualizing
  • Processing
  • Calculating
  • Typecasting
  1. The CAST function can be used to convert the DATE datatype to the DATETIME datatype.
  • True
  • False
  1. Fill in the blank: The _____ function can be used to return non-null values in a list.
  • TRIM
  • COALESCE
  • CAST
  • CONCAT
  1. You are working with a database table that contains employee data. The table includes columns about employee location such as city, state, country, and postal_code. You want to retrieve the first 3 characters of each postal code. You decide to use the SUBSTR function to retrieve the first 3 characters of each postal code, and use the AS command to store the result in a new column called new_postal_code.

You write the SQL query below. Add a statement to your SQL query that will retrieve the first 3 characters of each postal code and store the result in a new column as new_postal_code.

NOTE: The three dots (...) indicate where to add the statement.

What employee ID number appears in row 5 of your query result?

NOTE: The query index starts at 1 not 0.

  • 3
  • 1
  • 8
  • 7

Shuffle Q/A

  1. Why do data analysts choose to work with SQL? Select all that apply.
  • SQL can handle huge amounts of data.
  • SQL is a powerful software program.
  • SQL is a well-known standard in the professional community.
  • SQL is a programming language that can also create web apps.
  1. A team of data analysts is working on a large project that will take months to complete and contains a huge amount of data. They need to document their process and communicate with multiple databases. The team decides to use a SQL server as the main analysis tool for this project and SQL for the queries. What makes this the most efficient tool? Select all that apply.
  • SQL efficiently handles large amounts of data.
  • SQL records queries and changes throughout a project.
  • SQL contains commands that build visualizations.
  • SQL allows you to connect to multiple databases.
  1. Fill in the blank: _____ refers to the process of converting data from one type to another.
  • Formatting
  • Cleaning
  • Typecasting
  • Querying
  1. A data analyst is working with product sales data. They import new data into a database. The database recognizes the data for product price as text strings. What SQL function can the analyst use to convert text strings to floats?
  • LENGTH
  • TRIM
  • SUBSTR
  • CAST
  1. Fill in the blank: The _____ function can be used to join strings to create a new column.
  • CAST
  • COALESCE
  • TRIM
  • CONCAT
  1. As a data analyst, you are working on a quick project containing a small amount of data. As the data was emailed to you, there is no need to query the data. What tool should you use to perform your analysis?
  • Spreadsheet
  • SQL
  • word process
  • CSV
  1. A data analyst has added a massive table to their database on accident and needs to remove the table. What command can the analyst use to correct their mistake?
  • DROP TABLE IF NOT EXISTS
  • INSERT INTO
  • REMOVE TABLE IF EXISTS
  • DROP TABLE IF EXISTS
  1. You are working with a database table that contains invoice data. The table includes a column for customer_id. You want to remove duplicate entries for customer_id and get a count of total customers in the database.

You write the SQL query below. Add a DISTINCT clause that will remove duplicate entries from the customer_id column.

NOTE: The three dots (...) indicate where to add the clause.

What is the total number of customers in the database?

  • 84
  • 105
  • 43
  • 59
  1. In SQL databases, what data type refers to a number that does not contain a decimal?
  • String
  • Integer
  • Boolean
  • Float
  1. After joining multiple tables you find your data contains a significant amount of null values. What function can you use to return only the non-null values in a list ?
  • CAST
  • COALESCE
  • TRIM
  • CONCAT
  1. You are working with a database table that contains customer data. The table includes columns about customer location such as city, state, and country. The state names are abbreviated. You want to retrieve the first 2 letters of each state name. You decide to use the SUBSTR function to retrieve the first 2 letters of each state name, and use the AS command to store the result in a new column called new_state.

You write the SQL query below. Add a statement to your SQL query that will retrieve the first 2 letters of each state name and store the result in a new column as new_state.

NOTE: The three dots (...) indicate where to add the statement.

NOTE: SUBSTR takes in three arguments being column, starting_index, ending_index

What customer ID number is in row 9 of your query result?

NOTE: The query index starts at 1 not 0.

  1. A junior data analyst joins a new company. The analyst learns that SQL is heavily utilized within the organization. Why would the organization choose to invest in SQL? Select all that apply.
  • SQL is a programming language that can also create web apps.
  • SQL can handle huge amounts of data.
  • SQL is a powerful software program.
  • SQL is a well-known standard in the professional community.
  1. You are working with a database table that contains invoice data. The table includes columns for invoice_id and billing_state. You want to remove duplicate entries for billing state and sort the results by invoice ID.

You write the SQL query below. Add a DISTINCT clause that will remove duplicate entries from the billing_state column.

NOTE: The three dots (...) indicate where to add the clause.

What billing state appears in row 17 of your query result?

NOTE: The query index starts at 1 not 0.

  • AZ
  • NV
  • CA
  • WI
  1. You are working with a database table that contains customer data. The table includes columns about customer location such as city, state, country, and postal_code. You want to check for city names that are greater than 9 characters long.

You write the SQL query below. Add a LENGTH function that will return any city names that are greater than 9 characters long.

What is the first name of the customer that is in row 7 of your query result?

NOTE: The query index starts at 1 not 0.

  • Diego
  • Kara
  • Julia
  • Roberto
  1. In SQL databases, what data type refers to a number that contains a decimal?
  • Boolean
  • Float
  • Integer
  • String
  1. You’re working with a dataset that contains a float column with a significant amount of decimal places. This level of granularity is not needed for your current analysis. How can you convert the data in the float column to be integer data?
  • CAST
  • COALESCE
  • TRIM
  • CONCAT
  1. What SQL function lets you add strings together to create new text strings that can be used as unique keys?
  • CAST
  • COALESCE
  • TRIM
  • CONCAT
  1. What are some of the benefits of using SQL for analysis? Select all that apply.
  • SQL interacts with database programs.
  • SQL has better user management than spreadsheets.
  • SQL can pull information from different database sources.
  • SQL tracks changes across a team.
  1. A data analyst is managing a database of customer information for a retail store. What SQL command can the analyst use to add a new customer to the database?
  • UPDATE
  • CREATE TABLE IF NOT EXISTS
  • DROP TABLE IF EXISTS
  • INSERT INTO
  1. In SQL databases, True/False values refers to what data type?
  • String
  • Float
  • Integer
  • Boolean
  1. A data analyst is tasked with identifying what orders are still in transit. The current list of orders contains trillions of rows. What is the best tool for the analyst to use?
  • Spreadsheets
  • CSV
  • SQL
  • Word processor
  1. Your manager tasks you with analyzing a dataset and visually inspecting the data. Upon initial inspection you realize that this is a small dataset. What tool should you use to analyze the data?
  • CSV
  • Spreadsheet
  • SQL
  • Word processor
  1. A data analyst creates a database to store information on the company's customer data. When completing the initial import the analyst notices that they forgot to add a few customers into the table. What command can the analyst use to add these missed customers?
  • ADD
  • APPEND
  • INSERT INTO
  • DROP
  1. You are working with a database table that contains customer data. The table includes columns about customer location such as city, state, country, and postal_code. You want to find what state names are greater than 3 characters.

You write the SQL query below. Add a LENGTH function that will return any state names that are greater than 3 characters long.

What state is in row 1 of your query result?

NOTE: The query index starts at 1 not 0.

  • India
  • Chile
  • Dublin
  • Ireland
  1. In SQL databases, what function can be used to convert data from one datatype to another?
  • CAST
  • LENGTH
  • TRIM
  • SUBSTR
  1. After a company merger, a data analyst receives a dataset with billions of rows of data. They need to leverage this data to identify insights for upper management. What tool would be most efficient for the analyst to use?
  • Spreadsheet
  • Word processor
  • SQL
  • CSV
  1. You are working with a database table that contains customer data. The table includes columns about customer location such as city, state, country, and postal_code. The state names are abbreviated. You want to check for state names that are greater than 2 characters long.

You write the SQL query below. Add a LENGTH function that will return any state names that are greater than 2 characters long.

What country is in row 1 of your query result?

NOTE: The query index starts at 1 not 0.

  • Ireland
  • India
  • France
  • Chile
  1. You are working with a database table that contains employee data. The table includes columns about employee location such as city, state, country, and postal_code. You use the SUBSTR function to retrieve the first 3 characters of each last_name, and use the AS command to store the result in a new column called new_last_name.

You write the SQL query below. Add a statement to your SQL query that will retrieve the first 3 characters of each last_name and store the result in a new column as new_last_name.

NOTE: The three dots (...) indicate where to add the statement.

NOTE: SUBSTR takes in three arguments being column, starting_index, ending_index

What employee ID number is in row 8 of your query result?

NOTE: The query index starts at 1 not 0.

  • 7
  • 3
  • 1
  • 8

Week 4 – Verify and report on your cleaning results

What is involved in seeing the big picture when verifying data cleaning? Select all that apply.

  • Consider the reporting
  • Consider the data
  • Consider the business problem
  • Consider the goal

Fill in the blank: A data analyst uses the CASE statement to consider one or more _____, then return a value.

  • identifications
  • conditions
  • changes
  • fields

A data analyst uses a changelog to record how the data evolves while cleaning their data. What data cleaning best practice does this describe?

  • Examination
    • Disclosure
    • Illumination
  • Documentation
  1. Verification and reporting come directly before the data-cleaning process.
  • True
  • False
  1. What is the first step in the verification process?
  • Compare cleaned data with the original, uncleaned dataset and compare it to what is there now
  • Create a chronological list of modifications made to the data
  • Determine the quality of the data
  • Inform others of your data-cleaning effort
  1. Which of the following functions automatically remove extra spaces when cleaning data?
  • SNIP
  • REMOVE
  • TRIM
  • CLEAR
  1. What tool can a data analyst use to figure out how many identical errors occur in a dataset?
  • CASE
  • COUNTA
  • CONFIRM
  • COUNT
  1. Fill in the blank: A data analyst uses the CASE statement to consider one or more _____, then returns a value.
  • additions
  • conditions
  • identifications
  • changes
  1. What is the process of tracking changes, additions, deletions, and errors during data cleaning?
  • Recording
  • Observation
  • Cataloging
  • Documentation
  1. Fill in the blank: While cleaning data, a data analyst can use a changelog to keep a chronological list of changes they make. They can refer to it during the _____ period if there are errors or questions.
  • presenting
  • verification
  • documentation
  • visualization
  1. Reviewing version history is an effective way to view a changelog in SQL.
  • True
  • False

Shuffle Q/A

  1. In what step of the data-cleaning process do you find mistakes before you begin analyzing the data?
  • Confirming
  • Publishing
  • Verifying
  • Processing
  1. During the data cleaning process you find a significant amount of data that contains irrelevant spaces. Which function do you use to remove leading, trailing, or repeated spaces?
  • CUT
  • DELETE
  • TRIM
  • TIDY
  1. A data analyst is checking for errors in a dataset. They want to determine how many times the name of a country is in the dataset using a pivot table. What function can they use to find this count?
  • COUNTA
  • CHECK
  • COUNT
  • CASE
  1. You’re writing the below SQL query and need to change “World Wide Web” to “www”. What function would you use to accomplish this task?

SELECT

_____

WHEN ‘World Wide Web’ THEN ‘www’

END AS some_column

FROM

some_table

  • THEN
  • CASE
  • ELSE
  • WHEN
  1. What should a data analyst actively track throughout the data cleaning process?
  • Additions, changes, and queries
  • Errors, deletions, and notes
  • Changes, resolutions, and deletions
  • Errors, additions, and deletions
  1. A data analyst is in the verification process and needs to verify the modifications that they have made to the data. What could the analyst reference to find the changes they made throughout data cleaning?
  • Changelog
  • Notepad
  • Spreadsheet
  • Metadata
  1. A data analyst commits a query to the repository as a new and improved query. Then, they specify the changes they made and why they made them. This scenario is part of what process?
  • Reporting data
  • Visualizing data
  • Communicating with stakeholders
  • Creating a changelog
  1. The data collected for an analysis project has just been cleaned. What are the next steps for a data analyst? Select all that apply.
  • Reporting
  • Certification
  • Validation
  • Verification
  1. As a data analyst, you will need to keep the big picture in mind throughout any project when verifying data cleaning. What must the analyst do to take a big picture view of the project? Select all that apply.
  • Consider the data
  • Consider the goal
  • Consider the business problem
  • Consider the reporting
  1. During the verification process, you find that you missed a few leading spaces during data cleaning. What function can you use to eliminate these spaces?
  • TRIM
  • TIDY
  • CUT
  • CROP
  1. Which SQL tool considers one or more conditions, then returns a value as soon as a condition is met?
  • THEN
  • WHEN
  • CASE
  • ELSE
  1. Fill in the blank: Documentation is the process of tracking _____ during data cleaning. Select all that apply.
  • additions
  • deletions
  • changes
  • inactivity
  1. Fill in the blank: A changelog contains a _____ list of modifications made to a project.
  • random
  • approximate
  • chronological
  • synchronized
  1. You start a complex project that will take more than a year to complete. You need to document modifications made to your queries throughout the project. What is the correct way to store these modifications?
  • Creating a changelog
  • Creating a notepad
  • Visualizing data
  • Creating a spreadsheet
  1. Fill in the blank: A process to confirm that a data-cleaning effort was well-executed and the resulting data is accurate and reliable is known as _____.
  • verification
  • publishing
  • manipulation
  • processing
  1. A data analyst is in the verification step. They consider the business problem, the goal, and the data involved in their analytics project. What scenario does this describe?
  • Reporting on the data
  • Considering the stakeholders
  • Seeing the big picture
  • Visualizing the data
  1. During data cleaning, you find an error in a username where the ID number was accidentally joined to the user’s last name. You need to figure out if this username has been entered incorrectly more than once in your datasett. If you use a pivot table, what function can you use to determine the number of times this error occurs in your dataset?
  • CASE
  • COUNT
  • COUNTA
  • CHECK
  1. You’re working with a dataset that contains categorical variables. You notice that some of the strings are misspelled or are not capitalized. What function can you use to fix these errors when a condition is met?
  • ELSE
  • CASE
  • WHEN
  • THEN
  1. A data analyst uses a changelog while cleaning data. What process does a changelog support?
  • Illumination
  • Examination
  • Disclosure
  • Documentation
  1. A changelog is essential for storing chronological modifications made during the data cleaning process. When will an analyst refer to the information in the changelog to certify data integrity?
  • Documentation
  • Verification
  • Presenting
  • Visualization
  1. Fill in the blank: As a data analyst, you should always create a _____ to track your additions, deletions, errors, and changes to a query.
  • notepad
  • database
  • changelog
  • spreadsheet
  1. Fill in the blank: TRIM is a function that removes _____ spaces in data. Select all that apply.
  • repeated
  • trailing
  • leading
  • inner
  1. While verifying cleaned data, a data analyst encounters a misspelled name. Which function can they use to determine the number of misspelled occurrences in the dataset?
  • CASE
  • CHECK
  • CHECK
  • COUNTA
  1. At what point during the analysis process does a data analyst use a changelog?
  • While cleaning the data
  • While visualizing the data
  • While gathering the data
  • While reporting the data
  1. Your manager points out an error in a product ID number in your dataset. The Product IDs can be numbers like 42 or text like "CAD-425". Using a pivot table, what function can you use to find how many times this error occurs in the dataset?
  • COUNT
  • CHECK
  • COUNTA
  • CASE
  1. While reviewing your coworker’s data cleaning process, you find a few cases of trailing spaces in the data. What function can you use to remove these spaces?
  • REMOVE TRAILING
  • DELETE
  • CUT
  • TRIM
  1. Which of the following queries considers one or more conditions and returns a value as soon as that condition is met?
  • SELECT * WHEN CASE COLUMN = VARIABLE
  • SELECT * CASE IF COLUMN = VARIABLE
  • SELECT * CASE WHEN COLUMN = VARIABLE
  • SELECT * IF CASE COLUMN = VARIABLE

Course challenge

Scenario 2, questions 6-10

You’ve completed this program and are interviewing for a junior data scientist position. The job is at B.Spoke Market Research, a company that analyzes market conditions using customer surveys and other research methods. The detailed job description can be found below:

C4 B.Spoke Market Research Job Description.pdf

PDF File

So far, you’ve had a phone interview with a recruiter and you’ve secured a second interview with the B.Spoke team. The recruiter’s email can be found below:

C4 S2 Email from Recruiter.pdf

PDF File

You arrive 15 minutes early for your interview. Soon, you are escorted into a conference room, where you meet Jodie Choi, the data science lead. After welcoming you, the behavioral interview begins.

For your first question, your interviewer wants to learn about your experience with spreadsheets. She says: Sometimes the team needs data that is stored in different spreadsheets. So, we use a spreadsheet function to find the information we need.

There is a spreadsheet function that allows a data analyst to search for a value in the first column of a given range and return the value of a specified cell in the row in which it is found. What function allows you to complete these tasks?

  • COUNTIF
    • SEARCH
    • RETURN
  • VLOOKUP
  1. Scenario 1, questions 1-5

You are a data analyst at a small analytics company. Your company is hosting a project kick-off meeting with a new client, Meer-Kitty Interior Design. The agenda includes reviewing their goals for the year, answering any questions, and discussing their available data.

Before the meeting you review the About Us tab on their website and their business plan, linked below:

Meer-Kitty Interior Design has two goals. They want to expand their online audience, which means getting their company and brand known by as many people as possible. They also want to launch a line of high-quality indoor paint to be sold in-store and online. You decide to consider the data about indoor paint first.

To use the template for the survey feedback, click the link below and select “Use Template.”

Link to template: Kitty Survey Feedback

OR

If you don’t have a Google account, download the file directly from the attachment below.

When you refer to the Meer-Kitty survey feedback tab, you are pleased to find that the available data is aligned to the business objective. However, you do some research about confidence level for this type of survey and learn that you need at least 120 unique responses for the survey results to be useful. Therefore, the dataset has two limitations: First, there are only 40 responses; second, a Meer-Kitty superfan, User 588, completed the survey 11 times.

As the survey has too few responses and numerous duplicates that are skewing results, what are your options? Select all that apply.

  • Repeat the survey in order to create a new, improved dataset.
  • Talk with stakeholders and ask for more time.
  • Remove the duplicates from the data and proceed with analysis.
  • Locate another dataset about indoor paint.
  1. Scenario 1 continued

During the meeting, you also learn that Meer-Kitty videos are hosted on their website. For each product offered, there is an accompanying video for customers to learn more. So, more views for a video suggests greater consumer interest.

Your goal is to identify which videos are most popular, so Meer-Kitty knows what topics to explore in the future. Unfortunately, Meer-Kitty has just three months of data available because they only recently launched the videos on their site.

Without enough data to identify long-term trends about the video subjects that people prefer, what should you do?

  • Tell the client you’re sorry, but there is no way to meet their objective.
  • Find an alternate data source that will still enable you to meet your objective.
  • Watch the videos and use your gut instinct to identify which are most successful.
  • Move ahead with the data you have to determine the top video subjects.
  1. Scenario 1 continued

Now that you’ve identified some limitations with Meer-Kitty’s data, you want to communicate your concerns to stakeholders. In addition to insufficient video trend data, your main concern with the indoor paint survey is that the data isn’t representative of the population as a whole.

Clearly, one particular respondent, the superfan, is overrepresented. This is an example of margin of error.

  • True
  • False
  1. Scenario 1 continued

The stakeholders understand your concerns and agree to repeat the indoor paint survey. In a few weeks, you have a much better dataset with more than 150 responses and no duplicates.

To use the template for the survey feedback, click the link below and select “Use Template.”

Link to template: Kitty Survey Feedback

Or, if you don’t have a Google account, download the file directly from the attachment below.

If you are using the template, please refer to the New Meer-Kitty survey feedback tab. You notice that questions 4 and 5 are dependent on the respondent’s answer to question 3. So, you need to determine how many people answered Yes to question 3, then compare that to responses to questions 4 and 5. That way, you will know if questions 4 and 5 have any nulls.

You decide to use a spreadsheet tool that changes how cells appear when they contain the word Yes. Which tool do you use?

  • Filtering
  • Data validation
  • CONCATENATE
  • Conditional formatting
  1. Scenario 1, continued

You have finished cleaning the data to ensure it is complete, correct, and relevant to the problem you’re trying to solve. Then, you complete the verification and reporting processes to share the details of your data-cleaning effort with your team.

Your team notes one aspect of data cleaning that would help improve the dataset. They point out that the new survey also has a new question in Column G: “What are your favorite indoor paint colors?” This was a free-response question, so respondents typed in their answers. Some people included multiple different colors of paint. In order to determine which colors are most popular, it will be necessary to put each color in its own cell.

You use a spreadsheet function to divide the text strings in Column G around the commas and put each fragment into a new, separate cell. In this example, what are the commas called?

  • Substrings
  • MIDs
  • Delimiters
  • Partitions
  1. Scenario 2, questions 6-10

You’ve completed this program and are interviewing for a junior data scientist position. The job is at B.Spoke Market Research, a company that analyzes market conditions using customer surveys and other research methods. The detailed job description can be found below:

So far, you’ve had a phone interview with a recruiter and you’ve secured a second interview with the B.Spoke team. The recruiter’s email can be found below:

You arrive 15 minutes early for your interview. Soon, you are escorted into a conference room, where you meet Jodie Choi, the data science lead. After welcoming you, the behavioral interview begins.

For your first question, your interviewer wants to learn about your experience with spreadsheets. She says: Sometimes the team needs data that is stored in different spreadsheets. So, we use spreadsheet functions to help us find the information we need.

What function would you use to search for a certain value in a spreadsheet column to return the corresponding piece of information?

  • RETURN
  • SEARCH
  • COUNTIF
  • VLOOKUP
  1. Scenario 2, continued

Next, your interviewer wants to know more about your understanding of tools that work in both spreadsheets and SQL queries. She explains that the data her team receives from customer surveys sometimes has many duplicate entries.

She says: Spreadsheets have a great tool for that called remove duplicates. But when writing a SQL query, what command should you include in your SELECT statement to remove duplicates?

  • DIVERSE
  • DIFFERENT
  • DISCRETE
  • DISTINCT
  1. Scenario 2, continued

Now, your interviewer explains that the data team usually works with very large amounts of customer survey data. After receiving the data, they import it into a SQL table. But sometimes, the new dataset imports incorrectly and they need to change the format.

She asks: What function would you use to convert data in a SQL table from one datatype to another?

  • CAST
  • CHANGE
  • CONVERSE
  • COALESCE
  1. Scenario 2, continued

Next, your interviewer explains that one of their clients is an online retailer that has a vast inventory. She has a list of items by name, color, and size. Then, she has another list of the price of each item by size, as a larger item sometimes costs more. The stakeholder needs one list of all items by name, color, size, and price.

She then says: In situations such as this one, could you use the CONCAT function to add strings together to create new text strings?

  • Yes
  • No
  1. Scenario 2, continued

For your final question, your interviewer explains that her team often comes across data with extra leading or trailing spaces.

She asks: Which function would enable you to eliminate those extra spaces? You respond: To eliminate extra spaces for consistency, use the TRIM function.

  • True
  • False

Shuffle Q/A

  1. Scenario 1, questions 1-5

You are a data analyst at a small analytics company. Your company is hosting a project kick-off meeting with a new client, Meer-Kitty Interior Design. The agenda includes reviewing their goals for the year, answering any questions, and discussing their available data.

Before the meeting you review the About Us tab on their website and their business plan, linked below:

Meer-Kitty Interior Design has two goals. They want to expand their online presence, which means getting their company and brand known by as many people as possible. They also want to launch a line of high-quality indoor paint to be sold in-store and online. You decide to consider the data about indoor paint first.

To use the template for the survey feedback, click the link below and select “Use Template.”

Link to template: Kitty Survey Feedback

OR

If you don’t have a Google account, download the file directly from the attachment below.

When you refer to the Meer-Kitty survey feedback tab, you are pleased to find that the available data is aligned to the business objective. However, you do some research about confidence level for this type of survey and learn that you need at least 120 unique responses for the survey results to be useful. Therefore, the dataset has two limitations: First, there are only 40 responses; second, a Meer-Kitty superfan, User 588, completed the survey 11 times.

As the survey has too few responses and numerous duplicates that are skewing results, you should remove the duplicates and continue analyzing the remaining 29 responses.

  • True
  • False
  1. Scenario 1 continued

During the meeting, you also learn that Meer-Kitty videos are hosted on their website. For each product offered, there is an accompanying video for customers to learn more. So, more views for a video suggests greater consumer interest.

Your goal is to identify which videos are most popular, so Meer-Kitty knows what topics to explore in the future. Unfortunately, Meer-Kitty has just three months of data available because they only recently launched the videos on their site.

Without enough data to identify long-term trends about the video subjects that people prefer, what are your available options? Select all that apply.

  • Move ahead with the data you have to determine the top video subjects.
  • Watch the videos and use your gut instinct to identify which are most successful.
  • Ask to wait for more data and provide Meer-Kitty with an updated timeline.
  • Talk with Meer-Kitty stakeholders and ask to adjust the objective.
  1. Scenario 1 continued

The stakeholders understand your concerns and agree to repeat the indoor paint survey. In a few weeks, you have a much better dataset with more than 150 responses and no duplicates.

To use the template for the survey feedback, click the link below and select “Use Template.”

Link to template: Kitty Survey Feedback

OR

If you don’t have a Google account, download the file directly from the attachment below.

If you are using the template, please refer to the New Meer-Kitty survey feedback tab. You notice that questions 4 and 5 are dependent on the respondent’s answer to question 3. So, you need to determine how many people answered Yes to question 3, then compare that to responses to questions 4 and 5. That way, you will know if questions 4 and 5 have any nulls.

You decide to use a spreadsheet tool that changes how cells appear when they contain the word Yes. When using this tool, what is the word Yes?

  • The value in a VLOOKUP statement
  • The value in a conditional formatting rule
  • The value in a CONCATENATE range
  • The value in the COUNTA range
  1. Scenario 2, questions 6-10

You’ve completed this program and are interviewing for a junior data scientist position. The job is at B.Spoke Market Research, a company that analyzes market conditions using customer surveys and other research methods. The detailed job description can be found below:

So far, you’ve had a phone interview with a recruiter and you’ve secured a second interview with the B.Spoke team. The recruiter’s email can be found below:

You arrive 15 minutes early for your interview. Soon, you are escorted into a conference room, where you meet Jodie Choi, the data science lead. After welcoming you, the behavioral interview begins.

For your first question, your interviewer wants to learn about your experience with spreadsheets. She says: Sometimes the team needs data that is stored in different spreadsheets. So, we use a spreadsheet function to find the information we need.

There is a spreadsheet function that searches for a value in the first column of a given range and returns the value of a specified cell in the row in which it is found. It is called SEARCH.

  • True
  • False
  1. Scenario 2, continued

Next, your interviewer wants to know more about your understanding of tools that work in both spreadsheets and SQL. She explains that the data her team receives from customer surveys sometimes has many duplicate entries.

She says: Spreadsheets have a great tool for that called remove duplicates. Does this mean the team has to remove the duplicate data in a spreadsheet before transferring data to our database?

  • Yes
  • No
  1. Scenario 2, continued

Now, your interviewer explains that the data team usually works with very large amounts of customer survey data. After receiving the data, they import it into a SQL table. But sometimes, the new dataset imports incorrectly and they need to change the format.

She asks: Is there a SQL function that can convert data types such as currency, dates, and times in a SQL table?

  • Yes, data types including currency, dates, and times can be converted.
  • No, only currency can be converted.
  1. Scenario 2, continued

Next, your interviewer explains that one of their clients is an online retailer that needs to create product numbers for a vast inventory. Her team does this by combining the text strings for product number, manufacturing date, and color.

She asks: If you encountered a situation where you wanted to add strings together to create new text strings, which SQL function would you use?

  • COMBINE
  • COALESCE
  • CREATE
  • CONCAT
  1. Scenario 2, continued

For your final question, your interviewer explains that her team often comes across data with extra leading or trailing spaces.

She asks: Which SQL function enables you to eliminate those extra spaces for consistency?

  • TRIM
  • LEN
  • SUBSTR
  • LENGTH
  1. Scenario 1 continued

Now that you’ve identified some limitations with Meer-Kitty’s data, you want to communicate your concerns to stakeholders. In addition to insufficient video trend data, your main concern with the indoor paint survey is that the data isn’t representative of the population as a whole.

Clearly, one particular respondent, the superfan, is overrepresented. This means the data doesn’t represent the population as a whole.

When surveying people for Meer-Kitty in the future, what are some best practices you can use to address some of the issues associated with sampling bias? Select all that apply.

  • Increase sample size
  • Use data that keeps updating
  • Use data from only one source
  • Use random sampling
  1. Scenario 1, continued

You have finished cleaning the data to ensure it is complete, correct, and relevant to the problem you’re trying to solve. Then, you complete the verification and reporting processes to share the details of your data-cleaning effort with your team.

Your team notes one aspect of data cleaning that would help improve the dataset. They point out that the new survey also has a new question in Column G: “What are your favorite indoor paint colors?” This was a free-response question, so respondents typed in their answers. Some people included multiple different colors of paint. In order to determine which colors are most popular, it will be necessary to put each color in its own cell.

You decide to use a spreadsheet function to divide the text strings in Column G around the commas and put each fragment into a new, separate cell. You are using the SPLIT function.

  • True
  • False
  1. Scenario 2, continued

Next, your interviewer wants to know more about your understanding of tools that work in both spreadsheets and SQL. She explains that the data her team receives from customer surveys sometimes has many duplicate entries.

She says: Spreadsheets have a great tool for that called remove duplicates. In SQL, you can include DISTINCT to do the same thing. In which part of the SQL statement do you include DISTINCT?

  • The UPDATE statement
  • The SELECT statement
  • The FROM statement
  • The WHERE statement
  1. Scenario 2, continued

Now, your interviewer explains that the data team usually works with very large amounts of customer survey data. After receiving the data, they import it into a SQL table. But sometimes, the new dataset imports incorrectly and they need to change the format.

She asks: Is there a command or function that converts data in a SQL table from one datatype to another? You respond: Yes, it’s the CAST function.

  • True
  • False
  1. Scenario 2, continued

Next, your interviewer explains that one of their clients is an online retailer that has a vast inventory. She has a list of items by name, color, and size. Then, she has another list of the price of each item by size, as a larger item sometimes costs more. The client needs one list of all items by name, color, size, and price.

She then asks: If you were to use the CONCAT function to complete this task, what would it enable you to do?

  • Search for and return missing products in inventory
  • Create a unique key to tell products apart
  • Clean the product identifier text strings
  • Create a new product database table
  1. Scenario 2, continued

For your final question, your interviewer explains that her team often uses the TRIM function when writing SQL queries.

She asks: What is the TRIM function used for in SQL?

  • To eliminate extra leading or trailing spaces
  • To return the smallest numeric value from a list
  • To shorten the list of results
  • To eliminate null values
  1. Scenario 1, questions 1-5

You are a data analyst at a small analytics company. Your company is hosting a project kick-off meeting with a new client, Meer-Kitty Interior Design. The agenda includes reviewing their goals for the year, answering any questions, and discussing their available data.

Before the meeting you review the About Us tab on their website and their business plan, linked below:

Meer-Kitty Interior Design has two goals. They want to expand their online audience, which means getting their company and brand known by as many people as possible. They also want to launch a line of high-quality indoor paint to be sold in-store and online. You decide to consider the data about indoor paint first.

To use the template for the survey feedback, click the link below and select “Use Template.”

Link to template: Kitty Survey Feedback

OR

If you don’t have a Google account, download the file directly from the attachment below.

When you refer to the Meer-Kitty survey feedback tab, you are pleased to find that the available data is aligned to the business objective. However, you do some research about confidence level for this type of survey and learn that you need at least 120 unique responses for the survey results to be useful. Therefore, the dataset has two limitations: First, there are only 40 responses; second, a Meer-Kitty superfan, User 588, completed the survey 11 times.

As the survey has too few responses and numerous duplicates that are skewing results, you decide to repeat the survey in order to create a new, improved dataset. What is your first step?

  • Delete all of the data from the current, skewed survey.
  • Write new, improved survey questions.
  • Find a survey tool that only allows someone to complete the survey once.
  • Talk with stakeholders, explain the new timeline, and ask for approval.
  1. Scenario 1 continued

Now that you’ve identified some limitations with Meer-Kitty’s data, you want to communicate your concerns to stakeholders. In addition to insufficient video trend data, your main concern with the indoor paint survey is that the data isn’t representative of the population as a whole.

Clearly, one particular respondent, the superfan, is overrepresented. What does this situation describe?

  • Sampling bias
  • Statistical significance
  • Margin of error
  • Confidence level
  1. Scenario 1 continued

The stakeholders understand your concerns and agree to repeat the indoor paint survey. In a few weeks, you have a much better dataset with more than 150 responses and no duplicates.

To use the template for the survey feedback, click the link below and select “Use Template.”

Link to template: Kitty Survey Feedback

OR

If you don’t have a Google account, download the file directly from the attachment below.

If you are using the template, please refer to the New Meer-Kitty survey feedback tab. You notice that questions 4 and 5 are dependent on the respondent’s answer to question 3. So, you need to determine how many people answered Yes to question 3, then compare that to responses to questions 4 and 5. That way, you will know if questions 4 and 5 have any nulls.

You decide to use a spreadsheet tool that changes how cells appear when they meet a certain value — in this case, the word Yes. You are using VLOOKUP.

  • True
  • False

Course 5 – Analyze Data to Answer Questions

Week 1 – Organizing data to begin analysis

Which of the following tasks would a data analyst perform during the analyze phase of the data analysis process? Select all that apply.

  • Preparing a report for the stakeholders
  • Getting input from others
  • Visualizing the data with charts
  • Organizing data into understandable sections

A data analyst working on a dataset performs several calculations with the data. What phase of analysis is the analyst in?

  • Transform data
  • Organize data
  • Get input from others
  • Format and adjust data

A data analyst is sorting spreadsheet data. What tool should they use to make sure that the data across rows is kept together when they rearrange the data?

  • Sort together
  • Sort sheet
  • Sort column
  • Sort rows

A data analyst sorts a spreadsheet range between cells A15 and G71. They sort in ascending order by the second column, Column B. What is the syntax they are using?

  • =SORT(A15:G71, 2, FALSE)
  • =SORT(A15:G71, 2, TRUE)
  • =SORT(A15:G71, B, TRUE)
  • =SORT(A15:G71, B, FALSE)
  1. What is the goal of the analysis phase of the data analysis process?
  • To describe data structures
  • To generate new data
  • To identify trends and relationships in data
  • To make generalizations about data
  1. During which of the four phases of analysis do you compare your data to external sources?
  • Format and adjust data
  • Transform data
  • Get input from others
  • Organize data
  1. Which of the following actions might occur when transforming data? Select all that apply.
  • Identify a pattern in your data
  • Make calculations based on your data
  • Recognize relationships in your data
  • Eliminate irrelevant info from your data
  1. Typically, a data analyst uses filters when they want to expand the amount of data they are working with.
  • True
  • False
  1. A data analyst is sorting data in a spreadsheet. They select a specific collection of cells in order to limit the sorting to just specified cells. Which spreadsheet tool are they using?
  • Sort Sheet
  • Sort Range
  • Limit Sort
  • Limit Range
  1. A data analyst sorts a spreadsheet range between cells D5 and M5. They sort in descending order by the third column, Column F. What is the syntax they are using?
  • =SORT(D5:M5, C, TRUE)
  • =SORT(D5:M5, 3, FALSE)
  • =SORT(D5:M5, C, FALSE)
  • =SORT(D5:M5, 3, TRUE)
  1. You are querying a database that contains data about music. Each musical genre is given an ID number. You are only interested in data related to the genre with ID number 7. The genre IDs are listed in the genre_id column.

You write the SQL query below. Add a WHERE clause that will return only data about the genre with Id number 7.

Who is the composer listed in row 4 of your query result?

  • Caetano Veloso
  • Marisa Monte
  • Lulu Santos
  • Gilberto Gil
  1. You are working with a database that contains invoice data about online music purchases. You are only interested in invoices sent to customers located in the city of Delhi. You want to sort the invoices by order total in ascending order. The order totals are listed in the total column.

You write the SQL query below. Add an ORDER BY clause that will sort the invoices by order total in ascending order.

What total appears in row 4 of your query result?​

  • 1.98
  • 5.94
  • 8.91
  • 3.96

Shuffle Q/A

  1. Fill in the blank: The _____ phase of the data analysis process includes organizing data, formatting and adjusting data, getting input from others, and transforming data by observing relationships between data points and making calculations.
  • process
  • prepare
  • analyze
  • act
  1. During which of the four phases of analysis do you gather the relevant datasets into an usable structure for a project?
  • Format and adjust data
  • Get input from others
  • Transform data
  • Organize data
  1. Fill in the blank: Sorting ranks data based on a specific _____ that you select.
  • calculation
  • observation
  • metric
  • model
  1. A data analyst is sorting data in a spreadsheet. Which tool are they using if all of the data is sorted by the ranking of a specific sorted column and data across rows is kept together?
  • Sort Sheet
  • Sort Together
  • Sort Rank
  • Sort Document
  1. A data analyst sorts a spreadsheet range between cells A1 and E50. They sort in descending order by the fourth column, Column D. What is the syntax they are using?
  • =SORT(A1:E50, 4, FALSE)
  • =SORT(A1:E50, 4, TRUE)
  • =SORT(A1:E50, D, TRUE)
  • =SORT(A1:E50, D, FALSE)
  1. You are querying a database that contains data about music. You are only interested in data related to the jazz musician Miles Davis. The names of the musicians are listed in the composer column.

You write the following SQL query, but it is incorrect. What is wrong with the query?

SELECT *

FROM Track

WHERE composer = Miles Davis

  • Line 3 should be rewritten as WHERE composer is Miles Davis.
  • Composer in line 3 should be capitalized.
  • SELECT, FROM, and WHERE should not be capitalized.
  • Miles Davis should be in double quotation marks.
  1. You are working with a database that contains invoice data about online music purchases. You are only interested in invoices sent to customers located in the city of Paris. You want to sort the invoices by order total in ascending order. The order totals are listed in the total column.

You write the SQL query below. However this query is incorrect. What is wrong with it?

SELECT *

FROM invoice

WHERE billing_city = “Paris”

ORDER total

  • SELECT, FROM, WHERE, and ORDER are capitalized.
  • Line 4 is missing the text column = between ORDER and total.
  • In line 3, “Paris” has quotation marks.
  • Line 4 is missing the word BY between ORDER and total.
  1. After collecting the relevant datasets for their analysis, a data analyst compares this data to external sources. In which of the four phases of analysis does this occur?
  • Organize data
  • Format and adjust data
  • Transform data
  • Get input from others
  1. A data analyst working on a data set is investigating possible relationships in the data. What phase of analysis is the analyst in?
  • Format and adjust data
  • Get input from others
  • Transform data
  • Organize data
  1. A data analyst sorts a spreadsheet range between cells K9 and L20. They sort in ascending order by the first column, Column K. What is the syntax they are using?
  • =SORT(K9:L20, K, TRUE)
  • =SORT(K9:L20, K, FALSE)
  • =SORT(K9:L20, 1, TRUE)
  • =SORT(K9:L20, 1, FALSE)
  1. You are querying a database that contains data about music. Each album is given an ID number. You are only interested in data related to the album with ID number 3. The album IDs are listed in the album_id column.

You write the following SQL query, but it is incorrect. What is wrong with the query?

SELECT *

FROM Track

WHERE album = 3

  • In line 3, album should be album_id.
  • SELECT, FROM, and WHERE should be capitalized.
  • In line 3, album is not capitalized.
  • Line 3 contains an equal sign.
  1. In the data analysis process, which of the following refers to a phase of analysis? Select all that apply.
  • Format data using sorts and filters
  • Get input from others
  • Organize data into understandable sections
  • Visualize the data
  1. A data analyst is collecting all the datasets that are relevant to their project. Which of the four phases of analysis is the data analyst in?
  • Get input from others
  • Organize data
  • Format and adjust data
  • Transform data
  1. A data analyst investigating a data set is interested in showing only data that matches given criteria. What is this known as?
  • Sorting
  • Modeling
  • Measuring
  • Filtering
  1. You are working with a database that contains invoice data about online music purchases. You are only interested in invoices sent to customers located in the city of Delhi. You want to sort the invoices by order total in ascending order. The order totals are listed in the total column.

You write the SQL query below. However this query is incorrect. What is wrong with it?

SELECT *

FROM invoice

WHERE billing_city = “Delhi”

ORDER BY order_total

  • SELECT, FROM, WHERE, and ORDER BY are capitalized.
  • In line 4, order_total should be total.
  • In line 3, “Delhi” has quotation marks.
  • Line 4 contains the word BY.
  1. A data analyst chooses to rank the data based on a specific metric. What is the term for this action?
  • Sorting
  • Filtering
  • Modeling
  • Measuring
  1. A data analyst investigates the data they’ve collected to look for patterns and relationships between the data. They also perform calculations based on the data. In which of the four phases of analysis does this occur?
  • Format and adjust data
  • Transform data
  • Get input from others
  • Organize data
  1. A data analyst working on a very large dataset decides to narrow the scope of the data that they are working with in order to make the analysis more manageable. What can they use to narrow the amount of data?
  • Modeling
  • Sorting
  • Filtering
  • Measuring
  1. A data analyst uses a function to sort a spreadsheet range between cells H1 and K65. They sort in ascending order by the first column, Column H. What is the syntax they are using?
  • =SORT(H1:K65, 1, FALSE)
  • =SORT(H1:K65, A, TRUE)
  • =SORT(H1:K65, A, FALSE)
  • =SORT(H1:K65, 1, TRUE)
  1. You are querying a database that contains data about music. Each musical genre is given an ID number. You are only interested in data related to the genre with ID number 2. The genre IDs are listed in the genre_id column.

You write the following SQL query, but it is incorrect. What is wrong with the query?

SELECT *

FROM Track

WHERE composer = 2

  • Line 3 contains an equal sign.
  • Composer should be genre_id in line 3.
  • Composer is not capitalized in line 3.
  • SELECT, FROM, and WHERE are capitalized.
  1. You are performing a calculation during your analysis of a dataset. Which phase of analysis are you in?
  • Get input from others
  • Format and adjust data
  • Organize data
  • Transform data
  1. A data analyst is sorting spreadsheet data. They use the spreadsheet tool Sort Sheet. What does this tool do?
  • It sorts all of the data in a spreadsheet by a specific sorted column.
  • It sorts all of the data in a spreadsheet by the ranking of a specific sorted row.
  • It allows the analyst to sort by a specific sorted row.
  • It allows the analyst to sort a specific selection of cells only.

Week 2 – Formatting and adjusting data

You are responsible for maintaining the integrity of a dataset. Multiple analysts are working with this spreadsheet. What spreadsheet tool can you use to ensure that accidental changes are not recorded in the data?

  • Data validation
  • Find
  • Pop-up menus
  • Conditional formatting

You are working with a SQL database with tables for flight routes in Canada. The table contains one column with the names of the departure airports. A different column in the same table contains the names of the arrival airports. What function can you use in your query to combine the arrival and departure airport names into a new column?

  • COMBINE
    • GROUP
  • CONCAT
  • JOIN

You are querying a database of ice cream flavors to determine which stores are selling the most mint chip. For your project, you only need the first 80 records. What clause should you add to the following SQL query?

SELECT flavors FROM ice_cream_table WHERE flavor = “mint_chip”

  • LIMIT 80
  • LIMIT ,80
  • LIMIT = 80
  • LIMIT_80
  1. An analyst notes that the “160” in cell A9 is formatted as text, but it should be Australian dollars. What spreadsheet tool can help them select the right format?
  • Format as Dollar
  • EXCHANGE
  • CURRENCY
  • Format as Currency
  1. You are creating a spreadsheet to help you with your job search. Every time you find an interesting job, you add it to the spreadsheet. Then, you want to indicate two possible options: Need to Apply or Applied. What spreadsheet tool will save you time by enabling you to create a dropdown list with Need to Apply and Applied as the possible options?
  • Pop-up menus
  • Data validation
  • Find
  • Conditional formatting
  1. You are using a spreadsheet to keep track of your newspaper subscriptions. You add color to indicate if a subscription is current or has expired. Which spreadsheet tool changes how cells appear when values meet each expiration date?
  • Data validation
  • CONVERT
  • Conditional formatting
  • Add color
  1. You are analyzing data about the capitals of different countries. In your SQL database, you have one column with the names of the countries and another column with the names of the capitals. What function can you use in your query to combine the countries and capitals into a new column?
  • GROUP
  • CONCAT
  • JOIN
  • COMBINE
  1. You are querying a database of museums to determine which ones will have a sculpture exhibit this year. For your project, you only need the first 50 records. What clause should you add to the following SQL query?

SELECT museums

FROM museum_table

WHERE exhibit = “sculpture”

  • LIMIT 50
  • LIMIT,50
  • LIMIT = 50
  • LIMIT_50
  1. A data analyst is working with a spreadsheet that has very long text strings. Rather than counting the characters themselves to determine the number of characters they contain, what tool can they use?
  • The MID function
  • The CHAR function
  • The LEN function
  • The COUNT function
  1. Spreadsheet cell L6 contains the text string “Function”. To return the substring “Fun”, what is the correct syntax?
  • =RIGHT(L6, 3)
  • =LEFT(3,L6)
  • =LEFT(L6, 3)
  • =RIGHT(3,L6)
  1. When working with spreadsheets, data analysts can use the WHERE function to locate specific characters in a string.
  • True
  • False

Shuffle Q/A

  1. An analyst has financial data that is formatted as Canadian dollars, but it should be formatted as U.S. dollars. What spreadsheet tool can help them select the right format?
  • Format as Dollars
  • Format as Number
  • Format as Currency
  • Format as Money
  1. You are preparing a project tracker spreadsheet. Next to each project task, you need to add the name of the team member responsible. What spreadsheet tool will save you time by enabling you to create a drop-down list with team members’ names as the possible options?
  • Find
  • Conditional formatting
  • Pop-up menus
  • Data validation
  1. You are working with a SQL database that contains tables for the locations for a popular fast food restaurant. In this database, you have one column with the city location and another column with the state location for each restaurant. What function can you use in your query to combine the city and state into a new column?
  • COMBINE
  • CONCAT
  • JOIN
  • GROUP
  1. Fill in the blank: A data analyst is working with a spreadsheet that has very long text strings. They use the LEN function to count the number of _____ in the text strings.
  • substrings
  • characters
  • values
  • fields
  1. Spreadsheet cell H8 contains the text string “Marketing”. To return the substring “market”, what is the correct syntax?
  • =RIGHT(6,H8)
  • =LEFT(H8, 6)
  • =RIGHT(H8, 6)
  • =LEFT(6,H8)
  1. You are querying a database of restaurant locations to determine how many fast food companies have restaurants located in Texas. For your project, you only need the first 20 records. What clause should you add to the following SQL query?

SELECT fast_food

FROM restaurant_table

WHERE location = “Texas”

  • LIMIT,20
  • LIMIT_20
  • LIMIT 20
  • LIMIT = 20
  1. A data analyst is working with a spreadsheet that has very long text strings. They use a function to count the number of characters in cell B9. What is the correct syntax of the function?
  • =LEN(B9)
  • =LEN(“B9”)
  • =LEN(B,9)
  • =LEN(B:B9)
  1. You are working with a data set that contains string data. Cell C4 contains the string “Oct 13, 2004”. What does the function FIND(“,”, C4) output?
  • 4
  • 6
  • 8
  • 7
  1. An analyst notes that the “235” in cell B8 is formatted as text, but it should be Euros. What spreadsheet tool can help them select the right format?
  • Format as Euros
  • Format as Money
  • Format as Number
  • Format as Currency
  1. A utility company uses a spreadsheet to track the number of consecutive months each customer has paid their bill on time. They use a spreadsheet tool to apply color to the cells when the number of consecutive months is 12 or greater. What tool are they using?
  • Data validation
  • Add color
  • CONVERT
  • Conditional formatting
  1. Spreadsheet cell F2 contains the text string “Dashboard”. To return the substring “board”, what is the correct syntax?
  • =LEFT(5,F2)
  • =LEFT(F2, 5)
  • =RIGHT(5,F2)
  • =RIGHT(F2, 5)
  1. You are using the FIND function to identify the position of the whitespace in the string in cell A6. Which of the following is the correct function syntax for this purpose?
  • FIND(“_”, A6)
  • FIND(A6, _ )
  • FIND(A6, “ “)
  • FIND(“ “, A6)
  1. You are analyzing employee data for your company. In your SQL database, you have one column with the first names of the employees and another column with their last names. What function can you use in your query to combine the employee first names and last names into a new column?
  • CONCAT
  • COMBINE
  • JOIN
  • GROUP
  1. An analyst is working with a dataset of financial data. The data is formatted as U.S. dollars, and the analyst needs it to be in Japanese yen. What spreadsheet tool can help them select the right format?
  • Format as Currency
  • Format as Money
  • Format as Number
  • Format as Yen
  1. Which of the following are appropriate uses for a spreadsheet’s data validation tool? Select all that apply.
  • Avoiding invalid inputs to functions
  • Adding drop down menus on cells
  • Merging two or more columns.
  • Protecting structured data
  1. You are working with a spreadsheet that records the running time of various songs. What spreadsheet tool can you use to change how the cells appear when their value is less than 20 seconds?
  • CONVERT
  • Data validation
  • Conditional formatting
  • Add color
  1. A data analyst wants to write a SQL query to combine data from two columns and into a new column. What function can they use?
  • GROUP
  • CONCAT
  • JOIN
  • COMBINE
  1. Fill in the blank: When working with spreadsheets, data analysts can use the _____ function to locate specific characters in a string.
  • IDENTIFY
  • FROM
  • WHERE
  • FIND
  1. A data analyst at a symphony orchestra uses a spreadsheet to keep track of how many concerts require more than 80 musicians. They use a spreadsheet tool to change how cells appear when values equal 80 or more. What tool are they using?
  • CONVERT
  • Add color
  • Conditional formatting
  • Data validation
  1. A data analyst is working with a spreadsheet that has very long text strings. They use a function to count the number of characters in cell G11. What is the correct syntax of the function?
  • =LEN(“G11”)
  • =LEN(G11)
  • =LEN(G,11)
  • =LEN(G:G11)
  1. Spreadsheet cell C2 contains the text string “Deviation”. To return the substring “Dev”, what is the correct syntax?
  • =LEFT(3,C2)
  • =RIGHT(3,C2)
  • =LEFT(C2, 3)
  • =RIGHT(C2, 3)
  1. When working with spreadsheets, data analysts use the find function to locate specific characters in a string. Find is case-sensitive, so it’s necessary to input the substring exactly how it appears.
  • True
  • False

Week 3 – Aggregating data for analysis

When using VLOOKUP, there are some common limitations that data analysts should be aware of. One of these limitations is that VLOOKUP only returns the first match it finds, even if there are many possible matches within the column.

  • True
  • False
  1. Fill in the blank: Data aggregation involves creating a _____ collection of data that originally came from multiple sources.
  • expanded
  • localized
  • modified
  • summarized
  1. A data analyst uses the SUM function to add together numbers from a spreadsheet. However, after getting a zero result, they realize the numbers are actually text. What function can they use to convert the text to a numeric value?
  • VALUE
  • FIGURE
  • CONVERT
  • DIGIT
  1. When using VLOOKUP, there are some common limitations that data analysts should be aware of. One of these limitations is that VLOOKUP can only return a value from the data to the left of the matched value.
  • True
  • False
  1. Fill in the blank: When writing a function, a data analyst wraps a table array in dollar signs. This is an _____ , which is used to lock the array so rows and columns don’t change if the function is copied.
  • accurate reference
  • absolute reference
  • arbitrary reference
  • authentic reference
  1. The following is a selection from a spreadsheet:

To search for the growth in population in Indonesia, what is the correct VLOOKUP syntax?

  • =VLOOKUP(“Indonesia”, A2:C10, 3, false)
  • =VLOOKUP(Indonesia, A2*C10, 3, false)
  • =VLOOKUP(“Indonesia”, A2:C10, 2, false)
  • =VLOOKUP(Indonesia, A2:C10, 2, false)
  1. An INNER JOIN is a function that returns records with matching values in two or more tables. An OUTER JOIN is a function that combines RIGHT and LEFT JOIN to return all matching records in both tables.
  • True
  • False
  1. A data analyst writes a query that asks a database to return the number of rows in a specified range. Which function do they use?
  • COUNT DISTINCT
    • RANGE
  • COUNT
  • RETURN RANGE<
  1. Fill in the blank: In an SQL statement, the _____ is the name of the segment that executes first. Select all that apply.
  • central select
  • inner query
  • central query
  • inner select

Shuffle Q/A

  1. In data analytics, what is the process of gathering data from multiple sources and combining it into a single, summarized collection?
  • Data composition
  • Data aggregation
  • Data grouping
  • Data mapping
  1. A data analyst is performing numerical calculations on the data in their spreadsheet. Ahead of these calculations, they use the VALUE function. Why might they do this?
  • To get a list of all the distinct numbers in the data
  • To convert the numbers in the data from text to numerical values
  • To sum up all the numbers in the spreadsheet
  • To find the average of all the numbers in the spreadsheet
  1. You create a function using data values from a specified array. You notice that it works correctly only some of the time. You verify that the function was used correctly and you ask a colleague for their input. They ask if you locked the data array. What does this mean? Select all that apply.
  • The data array has been made an absolute reference.
  • The columns in the array cannot be changed.
  • The data is accessible with a password.
  • The rows in the array cannot be changed.
  1. The following is a selection from a spreadsheet:

To search for the population of Bangladesh, what is the correct VLOOKUP syntax?

  • =VLOOKUP(“Bangladesh”, A2:B10, 3, false)
  • =VLOOKUP(Bangladesh, A2:B10, 3, false)
  • =VLOOKUP(“Bangladesh”, A2:B10, 2, false)
  • =VLOOKUP(Bangladesh, A2*B10, 2, false)
  1. A data analyst writes a query in SQL with the RIGHT JOIN function

FROM fiction_table

RIGHT JOIN

books_table

What does this function do?

  • It returns all the records in the fiction table and only the records from the books table with matching values.
  • It returns all records in both the fiction table and the books table.
  • It returns only the records with values that match from both tables.
  • It returns all records in the books table and only the records from the fiction table with matching values.
  1. The COUNT DISTINCT function includes repeating values when returning values in a specified range.
  • True
  • False
  1. A data analyst writes a query in SQL. Inside this query, they have a second query. What is this second query called? Select all that apply.
  • Subquery
  • Central query
  • Smaller query
  • Nested query
  1. One of the limitations of the VLOOKUP function is that it can only search columns to the right of the column into which it is entered. What is another limitation of VLOOKUP?
  • It will only return the last match it finds.
  • It can only be used on numerical data.
  • It can only be used with text data.
  • It will only return the first match it finds.
  1. A data analyst wraps the data array for their function in dollar signs ($). What does this do? Select all that apply.
  • It converts the data to currency.
  • It makes it so that columns cannot be changed.
  • It makes it so that rows cannot be changed.
  • It creates an absolute reference.
  1. The following is a selection from a spreadsheet:

To search for the population of Brazil, what is the correct VLOOKUP syntax?

  • =VLOOKUP(“Brazil”, A2:B10, 2, false)
  • =VLOOKUP(Brazil, A2:B10, 2, false)
  • =VLOOKUP(Brazil, A2,B10, 3, false)
  • =VLOOKUP(Brazil, A2:B10, 3, false)
  1. You are writing a query that contains the COUNT function. What should this query return?
  • The number of rows in a specified range
  • The number of times the query has been run
  • The sum of all values in a specified range
  • The number of columns in a specified range
  1. A data analyst wants to be sure all of the numbers in a spreadsheet are numeric. What function should they use to convert text to numeric values?
  • VALUE
  • PROCESS
  • CONVERT
  • EXCHANGE
  1. The following is a selection from a spreadsheet:

To search for the population of Pakistan, what is the correct VLOOKUP syntax?

  • =VLOOKUP(Pakistan, A2*B10, 2, false)
  • =VLOOKUP(Pakistan, A2:B10, 3, false)
  • =VLOOKUP(“Pakistan”, A2:B10, 2, false)
  • =VLOOKUP(“Pakistan”, A2:B10, 3, false)
  1. A data analyst writes the following query in SQL with the LEFT JOIN function:

FROM music_table

LEFT JOIN

Entertainment_table

What does this function do?

  • It returns all records in the music table and only the records from the entertainment table with matching values.
  • It returns only the records with values that match from both tables.
  • It returns all the records in the entertainment table and only the record from the music table with matching values.
  • It returns all records in both the music table and the entertainment table.
  1. When working with subqueries, which part of the query segment executes first?
  • The inner query
  • The smaller query
  • The outer query
  • The larger query
  1. In data analytics, what is data aggregation?
  • The process of moving certain data points to a higher rank or position.
  • The process of modifying data in order to make it suitable for analysis.
  • The process of ensuring a company’s data is properly stored, managed, and maintained.
  • The process of gathering data from multiple sources and combining it into a single, summarized collection.
  1. VLOOKUP can have problems when used on data values that have leading and trailing spaces. What function can be used to eliminate these spaces?
  • TRIM
  • NOSPACE
  • VALUE
  • CUT
  1. You are using the VLOOKUP function in a specific column in your spreadsheet. You know that one of VLOOKUP’s limitations is that it can only search in columns to the right of the column into which it is entered. What can you do if you also want the function to search the data found to the left?
  • Use VLOOKUP in the leftmost column
  • Use VLOOKUP in the rightmost column
  • Copy that data into new columns to the right
  • Make the data into an absolute reference
  1. A data analyst creates an absolute reference around a function array. What is the purpose of the absolute reference?
  • To automatically change numeric values to currency values
  • To keep a function array consistent so rows and columns will automatically change if the function is copied
  • To lock the function array so rows and columns don’t change if the function is copied
  • To copy a function and apply it to all rows and columns
  1. When creating an SQL query, which JOIN clause returns all matching records in two or more database tables?
  • OUTER
  • INNER
  • LEFT
  • RIGHT
  1. A data analyst is working with data that has been collected over time and stored in different databases. What process must they perform if they are to calculate the statistics of this data?
  • Data aggregation
  • Data mapping
  • Data grouping
  • Data composition
  1. A data analyst uses the TRIM function on their spreadsheet. Why might they do this?
  • They plan to convert all numbers from text into numeric.
    • VLOOKUP needs data values to have leading spaces.
  • VLOOKUP needs data values to have trailing spaces.
  • They plan to use VLOOKUP on the spreadsheet data.
  1. A data analyst uses an absolute reference to lock a function array so rows and columns don’t change if the function is copied. What symbol is used to create an absolute reference?
  • Ampersand (&)
  • Asterisk (*)
  • Dollar sign ($)
  • Hashtag (#)
  1. What are some of the advantages of using subqueries in SQL? Select all that apply.
  • Subqueries can use special functions.
  • The logic is easier to read and understand.
  • All of the logic is in one place.
  • The query processes more efficiently.
  1. The VALUE function converts a numeric value into a text string in a spreadsheet.
  • True
  • False
  1. A data analyst locks the rows and columns in their spreadsheet by wrapping their function’s data array in dollar signs ($). Why would they do this?
  • To avoid incorrect calculations caused by changing the array
  • So that the data auto deletes after the function is used
  • So that other analysts’ functions can not access the arrayFeedback:
  • To stop people from accessing sensitive information in the array
  1. Which of the following terms describe a subquery? Select all that apply.
  • Nested query
  • Inner select
  • Inner query
  • Small query

Week 4 – Performing data calculations

A data analyst uses the following formula to calculate a new column in a SQL query. What best describes the result of the formula?

(colA + colB) / colC = new_col

  • colB is subtracted from colA then the result is multiplied by colC.
    • colB is added to colA then the result is multiplied by colC.
    • colB is divided by colC then the result is added to colA.
  • colB is added to colA then the result is divided by colC.
  1. A data analyst is working with a spreadsheet from a furniture company. To use the template for this spreadsheet, click the link below and select “Use Template.”

Link to template: Sample Transaction Table.

Or, if you don’t have a Google account, download the file directly from the attachment below.

The syntax of which of the following formulas would allow the analyst to count purchase sizes of two or more?

  • =COUNTIF(G2:G30, “>=2”)
  • =COUNTIF(H2:H30, “>=2”)
  • =SUMIF(H2:H30, “=4”)
  • =SUMIF(G2:G30, “<=1”)
  1. You are working in a spreadsheet and use the SUMIF function in the formula below as part of your analysis.

=SUMIF(A1:A25, ”<10”, C1:C25)

Which part of this formula is the criteria or condition?

  • ”<10”
  • =SUMIF
  • C1:C25
  • A1:A25
  1. A data analyst is working in a spreadsheet and uses the SUMPRODUCT function in the formula below as part of their analysis.

=SUMPRODUCT(A2:A10,B2:B10)

How does the SUMPRODUCT function calculate the cell ranges identified in the parentheses?The analyst wants to figure out the value of all of the items in the spreadsheet. Which formula will calculate the total price of all of the items?

  • It multiplies the values in the first range, then multiplies the values in the second range .
  • It adds the values in the first range, then adds the values in the second range.
  • It multiplies the ranges, then adds the sum of the products of the two ranges.
  • It adds the ranges, then multiplies them by the last value in the second array.
  1. You create a pivot table in a spreadsheet containing movie data. To use the template for this spreadsheet, click the link below and select “Use Template.”

Link to template: Movie Data Project.

Or, if you don’t have a Google account, download the file directly from the attachment below.

If you want to summarize the data using the AVERAGE function in the Values menu, which spreadsheet columns could you add data from? Select all that apply.

  • Box Office Revenue
  • Movie Title
  • Genre
  • Budget
  1. A data analyst uses the following SQL query to perform basic calculations on their data. Which types of operators is the analyst using in this SQL query? Select all that apply.
  • Multiplication
  • Addition
  • Subtraction
  • Division
  1. You are working with a database table that contains data about music. The table includes columns for track_id, track_name, composer, and milliseconds (duration of the music track). You are only interested in data about the classical musician Johann Sebastian Bach. You want to know the duration of each Bach track in seconds. You decide to divide milliseconds by 1000 to get the duration in seconds, and use the AS command to store the result in a new column called secs.

Add a statement to your SQL query that calculates the duration in seconds for each track and stores it in a new column as secs.

NOTE: The three dots (...) indicate where to add the statement.

What is the duration in seconds of the track with Id number 3408?

  • 307
  • 120
  • 153
  • 193
  1. You are working with a database table that contains data about music. The table includes columns for album_id and milliseconds (duration of the music tracks on each album). You want to find out the total duration for each album in milliseconds, and store the result in a new column named total_duration.

You write the SQL query below. Add a GROUP BY clause that will group the data by album Id number.

What is the total duration of the album with Id number 2?

  • 257252
  • 959711
  • 342562
  • 858088
  1. You are working with a database table that contains invoice data. The table includes columns for billing_state, billing_country, and total. You want to know the average total price for the invoices billed to the state of Wisconsin. You decide to use the AVG function to find the average total, and use the AS command to store the result in a new column called average_total.

Add a statement to your SQL query that calculates the average total and stores it in a new column as average_total.

NOTE: The three dots (...) indicate where to add the statement.

What is the average total for Wisconsin?

  • 5.54
  • 5.78
  • 6.08
  • 5.37

Shuffle Q/A

  1. A data analyst wants to calculate the number of rows that have a value of “shipped”. Which function could they use?
  • =MAX(G2:G30,”=shipped”)
  • =SUM(G2:G30,”=shipped”)
  • =COUNT(G2:G30,”=shipped”)
  • =COUNTIF(G2:G30,”=shipped”)
  1. You are working in a spreadsheet and use the SUMIF function in the following formula as part of your analysis.

=SUMIF(D2:D10,”>=50”,E2:E10)

Which part of this formula indicates the range of values to be added?

  • E2:E10
  • >=50
  • D2:D10
  • =SUMIF
  1. You create a pivot table and want to add up the total of all cells for each row and column value in the pivot table. Which function in the values menu would you use to summarize the data?
  • AVERAGE
  • SUM
  • PRODUCT
  • COUNTA
  1. What column is set as a value in the following pivot table?
  • Direction
  • Duration
  • MAX
  • Date
  1. In the following SQL query, which column is part of an addition operation that creates a new column?

SELECT

Yes_Responses,

No_Responses,

Total_Surveys,

Yes_Responses + No_Responses AS Responses_Per_Survey

FROM

Survey_1

  • Total_Surveys
  • Responses_Per_Survey
  • Yes_Responses
  • Survey_1
  1. What SQL operator is used to return the remainder of a division operation?
  • /
  • !=
  • <>
  • %
  1. What is the purpose of using data validation during your analysis process?
  • To ensure that you are able to use every piece of data from your raw data
  • To guarantee that all of your stakeholders will be happy with your results
  • To ensure that all data is complete, accurate, secure, and consistent
  • To guarantee that visualizations are visually pleasing
  1. What is the purpose of the <> operator in SQL?
  • To add two values
  • To return the remainder of a division operation
  • To check if two values are not equal
  • To set a value equal to another
  1. What is a reason to use a temporary table instead of a standard table in SQL?
  • A temporary table allows functions that are unavailable to standard tables.
  • A temporary table calculates formulas using less memory than standard tables.
  • A temporary table calculates formulas faster than standard tables.
  • A temporary table allows analysts to repeatedly work with the same subset of data.
  1. Which of the following SQL queries adds a table into the database?
  • SELECT * FROM table GROUP BY columnA ORDER BY columnB;
  • CREATE TABLE my_table AS (SELECT * FROM other_table);
  • SELECT * FROM table;
  • WITH my_table AS (SELECT * FROM other_table WHERE x = 0);
  1. What is the purpose of using pivot tables?
  • To multiply two arrays and add the results
  • To allow quick copying from one table to another
  • To view data in multiple ways to find insights and trends
  • To allow the use of SQL in spreadsheets
  1. How many different columns have been added to the values section of the pivot table editor?
  • 3
  • 2
  • 6
  • 1
  1. What SQL keyword is used to define a name for a calculated column?
  • SELECT
  • AS
  • FROM
  • WITH
  1. A data analyst uses the following formula to calculate a new row in a SQL query. What best describes the result of the formula?

(colA + colB) / colC = new_col

  • colB is added to colA then the result is multiplied by colC.
  • colB is subtracted from colA then the result is multiplied by colC.
  • colB is added to colA then the result is divided by colC.
  • colB is divided by colC then the result is added to colA.
  1. What is the process of checking and rechecking the quality of your data so that it is complete, accurate, secure, and consistent?
  • Data-driven development
  • Data visualization
  • Data augmentation
  • Data validation
  1. A data analyst finds some data that seems inconsistent. What is the first thing they should do?
  • Remove the inconsistent values.
  • Convert the inconsistent values to JSON.
  • Fill the odd values with filler values.
  • Determine if the inconsistent values are valid.
  1. What is a reason to use a WITH AS clause in a SQL statement?
  • The result is temporary.
  • The result is a pivot table.
  • The result calculates faster.
  • The result is a visualization.
  1. Which of the following SQL statements can be used to create temporary tables in SQL?
  • WITH my_table FROM (SELECT * FROM other_table);
  • WITH my_table AS (SELECT * FROM other_table WHERE x = 0);
  • CREATE TABLE my_table AS (SELECT * FROM other_table);
  • SELECT * FROM table;
  1. A data analyst wants to calculate the number of rows that have a SKU value of “K102145”. Which function could they use?
  • =COUNTIF(G2:G30,K102145)
  • =COUNTIF(K102145=G2:G30)
  • =COUNTIF(G2:G30,“=K102145”)
  • =COUNTIF(G2:G30,“K102145”)
  1. A data analyst wants to use a single function to multiply two ranges and then add the multiplied values. What single function can they use to accomplish this?
  • SUM
  • SUMPRODUCT
  • SUMIF
  • SUMIFS
  1. Which values of Date and Direction are used to calculate the value 450 in the following pivot table?
  • 2/3 and Down
  • 2/4 and Up
  • 2/5 and Down
  • 2/4 and Down
  1. When writing custom calculations in SQL, what characters can be used to group calculations to change the order of calculation?
  • Parentheses – ( )
  • Curly Braces – { }
  • Quotation Marks – “ “
  • Square Brackets – [ ]
  1. A data analyst is trying to manually recalculate a column that was present in their dataset. They want to find rows where the values in their column do not match the values in the original column. Which of the following SQL clauses could they use?
  • WHERE original_column !! recalcualted_column
  • WHERE original_column NOT EQUALS recalcualted_column
  • WHERE original_column <> recalcualted_column
  • WHERE original_column ~= recalcualted_column
  1. When working with a new dataset, how can you ensure that your data is valid?
  • Personally collect all data that you use in your analysis.
  • Manually check the calculations of calculated columns.
  • Convert all data to JavaScript Object Notation (JSON).
  • Fill in missing values with values that will favor your initial hypothesis.
  1. Which of the following statements about temporary tables is correct?
  • They must be created using the WITH AS SQL clause.
  • They must be created using the WITH AS SQL clause.
  • They are declared by enclosing a FROM statement between ##.
  • They are a special feature of BigQuery unavailable in other RDBMS.
  1. A data analyst wants to calculate the number of rows that have a value less than 150. Which function could they use?
  • =COUNTIF(”<150”,G2:G30)
  • =SUMIF(“<150”,G2:G30)
  • =COUNTIF(G2:G30,”<150”)
  • =SUMIF(G2:G30,“<150”)
  1. What is the purpose of the EXTRACT function in SQL?
  • Calculate using data extracted from other tables
  • Return a specific key-value pair from a JSON object
  • Return a specific portion of a date
  • Calculate the mathematical extract operation
  1. Which portion of a pivot table do you change if you want to use a different calculation to combine the results?
  • Filter
  • Columns
  • Values
  • Rows
  1. Which of the following statements about temporary tables is correct?
  • They must be created using the WITH AS SQL clause.
  • They are automatically deleted when the SQL database session ends.
  • They are declared by enclosing a FROM statement between ##.
  • They are a special feature of BigQuery unavailable in other RDBMS.

Course challenge

You notice that many cells in the city column, Column K, are missing a value. So, you use the zip codes to research the correct cities. Now, you want to add the cities to each donor’s row. However, you are concerned about making a mistake, such as a spelling typo.

What spreadsheet tool allows you to control what can and cannot be entered in your worksheet in order to avoid typos?

  • List
  • Data validation
  • VLOOKUP
  • Find

Your database contains people who live in many areas of Wyoming. However, it’s important to align your in-house data with the data from Food Justice Rock Springs. You also need to separate your data into the two lists: Donation_Form_List and Postcard_List. They will be based on each city’s distance from Rock Springs.

What SQL function do you use to select all data from the Donation_Form_List organized by zip code?

  • ORGANIZE
  • ORDER BY
  • SEQUENCE
  • ARRANGE BY

You finish cleaning your datasets, so you decide to review Tayen’s email one more time to make sure you completed the task fully. It’s a good thing you checked because you forgot to identify people who have served on the board of directors or board of trustees. She wants to write them a thank-you note, so you need to locate them in the database.

To retrieve only those records that include people who have served on the board of trustees or on the board of directors, you use the WHERE function. Which of the following SQL queries would return the needed information?

  • SELECT *

FROM Donation_Form_List

WHERE Board_Member != 'True' OR Trustee != 'True'

  • SELECT *

FROM Donation_Form_List

WHERE Board_Member != 'True' AND Trustee != 'True'

  • SELECT *

FROM Donation_Form_List

WHERE Board_Member = 'True' OR Trustee = "True"

  • SELECT *

FROM Donation_Form_List

WHERE Board_Member = 'True' AND Trustee = 'True'

Tayen informs you that she’s thinking about inviting anyone who donated at least $100 in 2018, as well. However, she only has five open spaces. She asks you to report how many people gave at least $100 so she can determine if they can also be invited to the event.

Which spreadsheet function do you use to count how many donations of $100 or greater appear in Column O (Contributions 2018)?

  • TOTAL
    • MAX
    • SUMIF
  • COUNTIF
  1. Scenario 1, Questions 1-7

For the past six months, you have been working for a direct-mail marketing firm as a junior marketing analyst. Direct mail is advertising material sent to people through the mail. These people can be current or prospective customers, clients, or donors. Many charities depend on direct mail for financial support.

Your company, Directly Dynamic, creates direct-mail pieces with its in-house staff of graphic designers, expert mail list services, and on-site printing. Your team has just been hired by a local nonprofit, Food Justice Rock Springs. The mission of Food Justice Rock Springs is to eliminate food deserts by establishing local gardens, providing mobile pantries, educating residents, and more. Click below to read the email from Tayen Bell, vice president of marketing and outreach.

You begin by reviewing the dataset. To use the template for this dataset, click the link below and select “Use Template.”

Link to template: Dynamic Dataset

Or, if you don’t have a Google account, download the file directly from the attachment below.

The client has asked you to send two separate mailings: one to people within 50 miles of Rock Springs; the other to anyone outside that area. So, to research each donor’s distance from the city, you first need to find out where all of these people live.

You could scroll through 209 rows of data, but you know there is a more efficient way to organize the cities.

Which of the following procedures will enable you to sort your spreadsheet by city (Column K) in ascending order? Select all that apply.

  • Select A2-R210, then use the drop-down menu to Sort Sheet by Column K from A to Z
  • Use the SORT function syntax: =SORT(A2:R210, 11, TRUE)
  • Select A2-R210, then use the drop-down menu to Sort Range by Column K from A to Z
  • Use the SORT function syntax: =SORT(A2:R210, K, TRUE)
  1. Scenario 1, continued

You notice that many cells in the city column, Column K, are missing a value. So, you use the zip codes to research the correct cities. Now, you want to add the cities to each donor’s row. However, you are concerned about making a mistake, such as a spelling typo.

Fill in the blank: To add drop-down lists to your worksheet with predetermined options for each city name, you decide to use _____.

  • VLOOKUP
  • the find tool
  • data validation
  • the LIST function
  1. Scenario 1, continued

Now, you decide to address Tayen’s request to include a handwritten note in the direct-mail piece for anyone who gave at least $100 last year.

Which of the following spreadsheet tools will enable you to change how cells appear if they contain a value of $100 or more?

  • Conditional formatting
  • The COUNTA function
  • The MAX function
  • Data validation
  1. Scenario 1, continued

At this point, you notice that the information about state and zip code is in the same cell. However, your company’s mailing list software requires states to be on a separate line from zip codes.

To move the 5-digit zip code in cell L2 into its own column, you use the function =LEFT(L2,5).

  • True
  • False
  1. Scenario 1, continued

Next, you duplicate your dataset twice using the Sheet Menu. You rename the first sheet Donation Form List, and you remove the cities that are further than 50 miles from Rock Springs. You rename the second sheet Postcard List, and you remove the cities that are within 50 miles of Rock Springs.

Then, you import these datasets into your company’s mailing list database. In a mailing list database, you create two tables: Donation_Form_List and Postcard_List. You decide to clean the Donation_Form_List first.

Your company’s mailing list software requires units to be on the same line as street addresses. However, they are currently in two separate columns (street_address and unit).

What portion of your SQL statement will instruct the database to combine these two columns into a new column called “address”?

  • CONCAT(street_address to unit) AS address
  • JOIN(street_address, ” to “, unit) AS address
  • CONCAT(street_address, ” to “, unit) AS address
  • JOIN(street_address to unit) AS address
  1. Scenario 1, continued

Your database contains people who live across Wyoming. However, it’s important to align your in-house data with the data from Food Justice Rock Springs. You also need to separate your data into the two lists: Donation_Form_List and Postcard_List. They will be based on each city’s distance from Rock Springs.

The zip codes are in a column called zip_code. What query do you use to select all data from the Donation_Form_List organized by zip code?

  1. Scenario 1, continued

You finish cleaning your datasets, so you decide to review Tayen’s email one more time to make sure you completed the task fully. It’s a good thing you checked because you forgot to identify people who have served on the board of directors or board of trustees. She wants to write them a thank-you note, so you need to locate them in the database.

To retrieve only those records that include people who have served on the board of trustees or on the board of directors, you use the WHERE function. The syntax is:

  • True
  • False
  1. Scenario 2, Questions 8-13

Your company’s direct-mail campaign was very successful, and Food Justice Rock Springs has continued partnering with Directly Dynamic. One thing you’ve been working on is assigning all donors identification numbers. This will enable you to clean and organize the lists more effectively.

Meanwhile, another team member has been creating a prospect list that contains data about people who have indicated interest in getting involved with Food Justice Rock Springs. These people are also assigned a unique ID. Now, you need to compare your donor list with the dataset in your database and collect certain data from both.

What SQL function will return records with matching values in both tables?

  • OUTER JOIN
  • INNER JOIN
  • LEFT JOIN
  • RIGHT JOIN
  1. Scenario 2, continued

Your next task is to identify the average contribution given by donors over the past two years. Tayen will use this information to set a donation minimum for inviting donors to an upcoming event.

You have performed the calculations for 2019, so now you move on to 2020. To return average contributions in 2020 (contributions_2020), you use the AVG function. You use the following section of a SQL query to find this average and store it in the AvgLineTotal variable:

AVG(contributions_2020) AS AvgLineTotal

  • True
  • False
  1. Scenario 2, continued

Now that you provided her with the average donation amount, Tayen decides to invite 50 people to the grand opening of a new community garden. You return to your New Donor List spreadsheet to determine how much each donor gave in the past two years. You will use that information to identify the 50 top donors and invite them to the event.

What is the correct syntax to add the contribution amounts in cells O2 and P2?

  • =SUM(O2*P2)
  • =SUM(O2/P2)
  • =SUM(“O2,P2”)
  • =SUM(O2,P2)
  1. Scenario 2, continued

Tayen informs you that she’s thinking about inviting anyone who donated at least $100 in 2018, as well. However, she only has five open spaces. She asks you to report how many people gave at least $100 so she can determine if they can also be invited to the event.

The correct syntax to count how many donations of $100 or greater appear in Column O is =SUMIF(O2:O210,">=100").

  • True
  • False
  1. Scenario 2, continued

The community garden grand opening was a success. In addition to the 55 donors Food Justice Rock Springs invited, 20 other prospects attended the event. Now, Tayen wants to know more about the donations that came in from new prospects compared to the original donors.

This SQL query can be used to identify the percentage of contributions from prospects compared to total donors:

  • True
  • False
  1. Scenario 2, continued

Your team creates a highly effective prospects list for Food Justice Rock Springs. After a few months, many of these prospects become donors. Now, Tayen wants to know the top three cities in which these new donors live. She will use that information to determine if it’s still true that people who live closer to Rock Springs are more likely to donate.

To retrieve the number of donors in each city, sorted high to low, you use the following query:

  • True
  • False

Shuffle Q/A

  1. Scenario 1, Questions 1-7

For the past six months, you have been working for a direct-mail marketing firm as a junior marketing analyst. Direct mail is advertising material sent to people through the mail. These people can be current or prospective customers, clients, or donors. Many charities depend on direct mail for financial support.

Your company, Directly Dynamic, creates direct-mail pieces with its in-house staff of graphic designers, expert mail list services, and on-site printing. Your team has just been hired by a local nonprofit, Food Justice Rock Springs. The mission of Food Justice Rock Springs is to eliminate food deserts by establishing local gardens, providing mobile pantries, educating residents, and more. Click below to read the email from Tayen Bell, vice president of marketing and outreach.

You begin by reviewing the dataset. To use the template for this dataset, click the link below and select “Use Template.”

Link to template: Dynamic Dataset

Or, if you don’t have a Google account, download the file directly from the attachment below.

The client has asked you to send two separate mailings: one to people within 50 miles of Rock Springs; the other to anyone outside that area. So, to research each donor’s distance from the city, you first need to find out where all of these people live.

You could scroll through 209 rows of data, but you know there is a more efficient way to organize the cities.

Which of the following functions will enable you to sort your spreadsheet by city (Column K) in ascending order?

  • =SORT(A2:R210, 11, TRUE)
  • =SORT(A2:R210, K, ASC)
  • =SORT(A2:R210, K, TRUE)
  • =SORT(A2:R210, 11, ASC)
  1. Scenario 1, continued

At this point, you notice that the information about state and zip code is in the same cell. However, your company’s mailing list software requires states to be on a separate line from zip codes.

What function do you use to move the 5-digit zip code in cell L2 into its own column?

  • =LEFT(L2,5)
  • =RIGHT(5,L2)
  • =LEFT(5,L2)
  • =RIGHT(L2,5)
  1. Scenario 1, continued

Next, you duplicate your dataset twice using the Sheet Menu. You rename the first sheet Donation Form List, and you remove the cities that are further than 50 miles from Rock Springs. You rename the second sheet Postcard List, and you remove the cities that are within 50 miles of Rock Springs.

Then, you import these datasets into your company’s mailing list database. In a mailing list database, you create two tables: Donation_Form_List and Postcard_List. You decide to clean the Donation_Form_List first.

Your company’s mailing list software requires units to be on the same line as street addresses. However, they are currently in two separate columns (street_address and unit).

You use a SQL function to instruct the database to combine the two columns into a new column called “address.” The syntax is: JOIN(street_address, " to ", unit) as address.

  • True
  • False
  1. Scenario 1, continued

You finish cleaning your datasets, so you decide to review Tayen’s email one more time to make sure you completed the task fully. It’s a good thing you checked because you forgot to identify people who have served on the board of directors or board of trustees. She wants to write them a thank-you note, so you need to locate them in the database.

To retrieve only those records that include people who have served on the board of trustees or on the board of directors, what is the correct query?

  1. Scenario 2, Questions 8-13

Your company’s direct-mail campaign was very successful, and Food Justice Rock Springs has continued partnering with Directly Dynamic. One thing you’ve been working on is assigning all donors identification numbers. This will enable you to clean and organize the lists more effectively.

Meanwhile, another team member has been creating a prospect list that contains data about people who have indicated interest in getting involved with Food Justice Rock Springs. These people are also assigned a unique ID. Now, you need to compare your donor list with the dataset in your database and collect certain data from both.

What SQL function will return all records from the left table and only the matching records from the right?

  • INNER JOIN
  • OUTER JOIN
  • LEFT JOIN
  • RIGHT JOIN
  1. Scenario 2, continued

Your next task is to identify the average contribution given by donors over the past two years. Tayen will use this information to set a donation minimum for inviting donors to an upcoming event.

You start with 2019. To return average contributions in 2019 (contributions_2019), you use the AVG function. What portion of your SQL statement will instruct the database to find this average and store it in the AvgLineTotal variable?

  • AVG(“contributions_2019”) IN AvgLineTotal
  • AVG(contributions_2019) AS AvgLineTotal
  • AVG(“contributions_2019”) AS AvgLineTotal
  • AVG(contributions_2019) = “AvgLineTotal”
  1. Scenario 2, continued

Tayen informs you that she’s thinking about inviting anyone who donated at least $100 in 2018, as well. However, she only has five open spaces. She asks you to report how many people gave at least $100 so she can determine if they can also be invited to the event.

What is the correct syntax to count how many donations of $100 or great appear in Column O?

  • =COUNTIF(02:2010,”<=100”)
  • =COUNTIF(O2:O210,”>=100″)
  • =SUMIF(02:2010,”>=100”)
  • =SUMIF(O2:2010,”>=100″)
  1. Scenario 2, continued

The community garden grand opening was a success. In addition to the 55 donors Food Justice Rock Springs invited, 20 other prospects attended the event. Now, Tayen wants to know more about the donations that came in from new prospects compared to the original donors.

Which SQL query can be used to calculate the percentage of contributions from prospects?

  1. Scenario 2, continued

Your team creates a highly effective prospects list for Food Justice Rock Springs. After a few months, many of these prospects become donors. Now, Tayen wants to know the top three cities in which these new donors live. She will use that information to determine if it’s still true that people who live closer to Rock Springs are more likely to donate.

What clause do you add to the following query to sort the donors in each city from high to low?

  • ORDER BY CITY(DonorID) ASC
  • ORDER BY COUNT(DonorID) DESC
  • ORDER BY CITY(DonorID) DESC
  • ORDER BY COUNT(DonorID) ASC
  1. Scenario 1, Questions 1-7

For the past six months, you have been working for a direct-mail marketing firm as a junior marketing analyst. Direct mail is advertising material sent to people through the mail. These people can be current or prospective customers, clients, or donors. Many charities depend on direct mail for financial support.

Your company, Directly Dynamic, creates direct-mail pieces with its in-house staff of graphic designers, expert mail list services, and on-site printing. Your team has just been hired by a local nonprofit, Food Justice Rock Springs. The mission of Food Justice Rock Springs is to eliminate food deserts by establishing local gardens, providing mobile pantries, educating residents, and more. Click below to read the email from Tayen Bell, vice president of marketing and outreach.

You begin by reviewing the dataset. To use the template for this dataset, click the link below and select “Use Template.”

Link to template: Dynamic Dataset

Or, if you don’t have a Google account, download the file directly from the attachment below.

The client has asked you to send two separate mailings: one to people within 50 miles of Rock Springs; the other to anyone outside that area. So, to research each donor’s distance from the city, you first need to find out where all of these people live.

You could scroll through 209 rows of data, but you know there is a more efficient way to organize the cities.

Which of the following tools will enable you to sort your spreadsheet by city (Column K) in ascending order?

  • Sort Range by Column K from A to Z
  • Sort Range by Column K from Z to A
  • Sort Sheet by Column K from A to Z
  • Sort Sheet by Column K from Z to A
  1. Scenario 1, continued

Now, you decide to address Tayen’s request to include a handwritten note in the direct-mail piece for anyone who gave at least $100 last year.

Which of the following procedures will enable you to change how cells in your spreadsheet appear if they contain a value of $100 or more?

  • Select Column M. Then, select Format > Conditional Formatting. Choose to format cells if they are greater than 100.
  • Select Column M. Then, select Format > Conditional Formatting. Choose to format cells if text starts with 100.
  • Select Column M. Then, select Format > Conditional Formatting. Choose to format cells if text contains 100.
  • Select Column M. Then, select Format > Conditional Formatting. Choose to format cells if they are greater than or equal to 100.
  1. Scenario 1, continued

Your database contains people who live in many areas of Wyoming. However, it’s important to align your in-house data with the data from Food Justice Rock Springs. You also need to separate your data into the two lists: Donation_Form_List and Postcard_List. They will be based on each city’s distance from Rock Springs.

The zip codes are in a column called zip_code. To select all data from the Donation_Form_List organized by zip code, you use the ORDER BY function. The syntax is:

  • True
  • False
  1. Scenario 1, continued

You finish cleaning your datasets, so you decide to review Tayen’s email one more time to make sure you completed the task fully. It’s a good thing you checked because you forgot to identify people who have served on the board of directors or board of trustees. She wants to write them a thank-you note, so you need to locate them in the database.

To retrieve only those records that include people who have served on the board of trustees or on the board of directors, what clause do you include in your query?

  • WHERE Board_Member = “TRUE” AND Trustee = “TRUE”
  • WHERE Board_Member = “TRUE” OR Trustee = “TRUE”
  • WHERE Board_Member = TRUE AND Trustee = TRUE
  • WHERE Board_Member = TRUE, Trustee = TRUE
  1. Scenario 2, continued

Your team creates a highly effective prospects list for Food Justice Rock Springs. After a few months, many of these prospects become donors. Now, Tayen wants to know the top three cities in which these new donors live. She will use that information to determine if it’s still true that people who live closer to Rock Springs are more likely to donate.

Which SQL query will retrieve the number of donors in each city, sorted high to low?

  1. Scenario 1, continued

At this point, you notice that the information about state and zip code is in the same cell. However, your company’s mailing list software requires states to be on a separate line from zip codes.

What function will enable you to move the 2-character state abbreviation in cell L2 into its own column?

  • =RIGHT(L2,2)
  • =RIGHT(2,L2)
  • =LEFT(2,L2)
  • =LEFT(L2,2)

Course 6 – Share Data Through the Art of Visualization

Week 1 – Visualizing data

A data analyst notices that two variables in their data seem to rise and fall at the same time. They recognize that these variables are related somehow. What is this an example of?

  • Correlation
  • Visualization
  • Causation
  • Tabulation

A data analyst adds labels to their line graph to make it easier to read, even though they already have a legend on their visualizations. How does labeling the data make it more accessible?

  • Labeling gives the same information as the legend.
    • Labeling adds contrast to a visualization.
  • Labeling does not depend on interpreting colors.
  • Labeling hides unnecessary information.

You are going to give a presentation to a broad audience. How can you make sure your visualizations are accessible to all members of the audience? Select two that apply.

  • Include a lot of text in the visualization
    • Minimize contrast between colors
  • Label data directly when possible
  • Provide text alternatives
  1. A data analyst wants to create a visualization that demonstrates how often data values fall into certain ranges. What type of data visualization should they use?
  • Line graph
  • Scatter plot
  • Histogram
  • Correlation chart
  1. What do correlation charts reveal about the data they contain?
  • Causation
  • Relationships
  • Changes
  • Visualization
  1. You are creating a presentation for stakeholders and are choosing whether to include static or dynamic visualizations. Describe the difference between static and dynamic visualizations.
  • Static visualizations are interactive and can automatically change over time. Dynamic visualizations do not change over time unless they’re edited.
  • Static visualizations do not change over time unless they’re edited. Dynamic visualizations are interactive and can automatically change over time.
  • Static visualizations combine multiple visualizations into a whole. Dynamic visualizations separate out the individual elements of a single visualization.
  • Static visualizations separate out the individual elements of a single visualization. Dynamic visualizations combine multiple visualizations into a whole.
  1. Sophisticated use of contrast helps separate the most important data from the rest using the visual context that our brains naturally respond to.
  • True
  • False
  1. Design thinking is a process used to solve complex problems in a visually appealing way.
  • True
  • False
  1. Fill in the blank: During the _____ phase of the design process, you start to generate data visualization ideas.
  • empathize
  • ideate
  • test
  • define
  1. A data analyst adds labels to their line graph to make it easier to read even though they already have a legend on their visualizations. How does labeling the data make it more accessible?
  • Labeling doesn’t depend on interpreting colors
  • Labelling adds contrast to a visualization
  • Labeling creates more visual interest
  • Labeling helps redirect focus from outliers
  1. Fill in the blank: You should distinguish elements of your data visualization by _____ the foreground and background and using contrasting colors and shapes. This makes the content more accessible.
  • highlighting
  • separating
  • overlapping
  • aligning

Shuffle Q/A

  1. A data analyst working for an e-commerce website creates the following data visualization to present the amount of time users spend on the site:

What type of visualization is this?

  • Correlation chart
  • Histogram
  • Line graph
  • Scatterplot
  1. A data analyst is creating a chart for a presentation. The data they will display shows a correlation between variables. Why should they be careful when presenting their chart to an audience?
  • Correlation can be misunderstood as causation.
  • Correlation causes accessibility issues.
  • Correlation should be avoided in charts.
  • Correlation can only be represented in bar charts.
  1. What type of data visualizations allow users to have some control over what they see?
  • Aesthetic visualizations
  • Dynamic visualizations
  • Geometric visualizations
  • Static visualizations
  1. Design thinking is a process used to solve problems in a user-centric way.
  • True
  • False
  1. During which phase of the design process do you try to understand the emotions and needs of your target audience?
  • Prototype
  • Ideate
  • Test
  • Empathize
  1. A data analyst wants to make their visualizations more accessible by adding text explanations directly on the visualization. What is this called?
  • Distinguishing
  • Subtitling
  • Labeling
  • Simplifying
  1. What should data analysts do to make presentations more accessible for people who are blind and people with low vision?
  • Minimize contrast between colors
  • Remove labels from data
  • Provide text alternatives
  • Avoid using shapes and patterns to differentiate data
  1. You need to create a chart that displays the number of data records in each age group of a dataset. What type of chart would best represent this data?
  • Histogram Chart
  • Ranked Bar Chart
  • Correlation Chart
  • Time Series Chart
  1. Which of the following is generally good practice when using bar charts?
  • Display the bars in ranked order
  • Make the gaps wider than the bars.
  • Design bar charts with a single color.
  • Avoid stacked bar charts.
  1. What are the key elements of effective visualizations you should focus on when creating data visualizations? Select all that apply.
  • Clear meaning
  • Sophisticated use of contrast
  • Visual form
  • Refined execution
  1. Fill in the blank: Design thinking is a process used to solve complex problems _____.
  • as quickly as possible
  • in a user-centric way
  • using a set order of processes
  • with minimal user input
  1. Fill in the blank: A data analyst can make their visualizations more accessible by adding _____, which are text explanations placed directly on the visualizations.
  • labels
  • legends
  • callouts
  • subheadings
  1. Distinguishing elements of your data visualizations makes the content easier to see. This can help make them more accessible for audience members with visual impairments. What are some methods data analysts use to distinguish elements?
  • Ensure all elements are highlighted equally
  • Separate the foreground and background
  • Use similar colors and shapes
  • Add a legend
  1. You need to create a chart that explores how temperature changes throughout the year. What type of chart would best represent this data?
  • Correlation Chart
  • Time Series Chart
  • Histogram
  • Ranked Bar Chart
  1. What type of visualizations give you the most control over the story you want to tell with your data?
  • Static visualizations
  • Dynamic visualizations
  • Aesthetic visualizations
  • Geometric visualizations
  1. Fill in the blank: When choosing a chart you should choose the one that _____.
  • makes use of the most modern visualization tool
  • uses the least number of visual elements like size and shape
  • uses as many visual elements like size and shape as possible
  • makes it easiest to understand the point you are trying to make
  1. A data analyst is designing a chart. They decide to use colors that make sense to their audience. What phase of creating data visualizations does this describe?
  • Test Phase
  • Ideate Phase
  • Prototype Phase
  • Empathize Phase
  1. During which phase of the design process do you start to generate data visualization ideas?
  • Ideate
  • Test
  • Empathize
  • Define
  1. What should you include in the headline of a data visualization?
  • Abbreviations
  • Clear language
  • Acronyms
  • Fancy typography
  1. A data analyst is making their data visualization more accessible. They separate the background and the foreground of the visualization using bright, contrasting colors. What does this describe?
  • Labelling
  • Text alternatives
  • Distinguishing
  • Text-based format
  1. Causation occurs when an action directly leads to an outcome.
  • True
  • False
  1. What type of charts are effective for presenting the composition of data? Select all the apply.
  • Pie chart
  • Line chart
  • Tree map
  • Heat map
  1. When using design thinking, what group of people should you think about the most?
  • The general public
  • Your team
  • The shareholders
  • Your users
  1. You are in the ideate phase of the design process. What are you doing at this stage?
  • Making changes to their data visualization
  • Generating visualization ideas
  • Creating data visualizations
  • Sharing data visualizations with a test audience
  1. Where is the best place to put labels that describe the meaning of individual data elements in a data visualization?
  • Left of the chart area
  • In the legend
  • In the data
  • Below the chart area
  1. Fill in the blank: A data analyst creates a presentation for stakeholders. They include _____ visualizations because they don’t want the visualizations to change unless they choose to edit them.
  • aesthetic
  • dynamic
  • static
  • geometric
  1. While creating a chart to share their findings, a data analyst uses the color red to make important data stand out and separate it from the rest of the visualization. Which element of effective visualization does this describe?
  • Refined execution
  • Clear meaning
  • Sophisticated use of contrast
  • Subtitles
  1. You are in the process of creating data visualizations. You have considered the goal, the audience's needs, and come up with an idea. Next, you will share the visualization with peers. What phase of the design process will you be in?
  • Ideate
  • Define
  • Test
  • Empathize
  1. What text element in a visualization should be placed above the chart and clearly state what data is being presented?
  • Headline
  • Label
  • Annotation
  • Subtitle
  1. How much data should you represent when designing an effective data visualization?
  • Include a subset of the data that your audience will like
  • Only represent data that supports your initial hypothesis
  • Include all of the data from your analysis to ensure that your data visualization is complete and accurate
  • Only represent data the audience needs to understand your findings, unless it is misleading

Week 2 – Creating data visualizations with Tableau

A data analyst is using the Color tool in Tableau to apply a color scheme to a data visualization. They want the visualization to be accessible for people with color vision deficiencies, so they use a color scheme with lots of contrast. What does it mean to have contrast?

  • The color scheme is graphically pleasing.
  • The color scheme uses a range of different colors.
  • The color scheme is uniform.
  • The color scheme is monotone.

You are working with the World Happiness data in Tableau. What tool do you use to change your point of view of Greece?

  • Lasso
  • Pan
  • Rectangular
  • Radial
  1. Tableau is used to create interactive and dynamic visualizations. A visualization is interactive when the audience can control what data they see. What does it mean for a visualization to be dynamic?
  • The visualization can change over time
  • The visualization cannot be altered
  • The visualization can be downloaded
  • The visualization can include audio
  1. A data analyst uses the Color tool in Tableau to apply a color scheme to a data visualization. In order to make the visualization accessible for people with color vision deficiencies, what should they do next?
  • Make sure the color scheme has contrast
  • Make sure the color scheme is uniform
  • Make sure the color scheme uses only one color, in various shades
  • Make sure the color scheme is stylish
  1. You are working with the World Happiness data in Tableau. What tool do you use to select the area on the map representing Central America?
  • Radial
  • Lasso
  • Pan
  • Rectangular
  1. Fill in the blank: A data analyst is working with the World Happiness data in Tableau. To get a better view of Moldova, they use the _____ tool.
  • Lasso
  • Pan
  • Rectangular
  • Radial
  1. You are using the Label tool in Tableau. What will it enable you to do with the World Happiness map visualizations?
  • Separate out a selected country on the map
  • Hide certain countries on the map
  • Display the population of each country on the map
  • Increase the size of a country on the map
  1. You are working with the World Happiness data in Tableau. Which tool will enable you to show certain data while hiding the rest?
  • Format
  • Dimension
  • Filter
  • Attribute
  1. By default, all visualizations you create using Tableau Public are available to other users. What icon do you click to hide a visualization?
  • Source
  • Close
  • Private
  • Eye
  1. Fill in the blank: In Tableau, a diverging palette displays two value ranges. It uses a color to show the range where a data point is from and color intensity to show its ______.
  • magnitude
  • purpose
  • origination
  • attributes

Shuffle Q/A

  1. Tableau is used to create dynamic and interactive visualizations. Dynamic visualizations can change over time. What does it mean for a visualization to be interactive?
  • The audience can export the datasets
  • The audience can control what data they see
  • The audience can listen to audio about the data
  • The audience can collaborate on changes to the data
  1. A data analyst uses the Color tool in Tableau to apply a color scheme to a data visualization. Why do they make sure the color scheme has contrast?
  • To make the visualization more stylish for users to enjoy
  • To make the visualization more elaborate
  • To make the visualization uniform
  • To make the visualization accessible for people with color vision deficiencies
  1. A data analyst is working with the World Happiness data in Tableau. What tool do they use to select the area on the map representing Finland?
  • Pan
  • Rectangular
  • Radial
  • Lasso
  1. A data analyst is using the Pan tool in Tableau. What are they doing?
  • Rotating the perspective while keeping a certain object in view
  • Taking a screenshot of the visualization
  • Copying a data point to a second location in the visualization
  • Deselecting a data point from within the visualization
  1. Fill in the blank: In Tableau, the Label tool is located on the _____ shelf.
  • pages
  • columns
  • rows
  • marks
  1. A data analyst is giving a presentation with the World Happiness data in Tableau. Their insights focus only on those countries with a happiness score greater than 4.5. What tool can they use to show only those countries while hiding the rest?
  • Format
  • Filter
  • Attribute
  • Dimension
  1. An analyst working in Tableau uses color to show the range where a data point is from and intensity to show its magnitude. What is this called?
  • Value overlay
  • Diverging palette
  • Color attribute
  • Conditional formatting
  1. Fill in the blank: In Tableau, a _____ visualization is one that can change over time.
  • interpretive
  • dynamic
  • interactive
  • sensitive
  1. You are designing a visualization in Tableau and you want to ensure it is accessible. What can you apply with the Color tool in Tableau to make your visualization accessible for people with color vision deficiencies?
  • palette
  • filtering
  • variation
  • contrast
  1. You are working with the World Happiness data in Tableau. What tool do you use to change your point of view of Italy?
  • Radial
  • Rectangular
  • Pan
  • Lasso
  1. A data analyst working with the World Happiness data in Tableau displays the populations of each country in their visualization. What tool did they use?
  • Detail
  • Tooltip
  • Size
  • Label
  1. A data analyst is creating a visualization in Tableau Public. They want to keep it private from other users until it is complete. Which icon should they click?
  • Source
  • Close
  • Private
  • Eye
  1. Fill in the blank: When using Tableau, people can control what data they see in a visualization. This is an example of Tableau being _____.
  • indefinable
  • interpretive
  • inanimate
  • interactive
  1. A data analyst working with the World Happiness data in Tableau is only interested in those countries that have a happiness score of less than 3.5. What tool can they use to only show these countries?
  • Dimension
  • Attribute
  • Format
  • Filter
  1. A data analyst is creating a visualization in Tableau public. Before they began, they clicked on the eye icon. What is the purpose for this?
  • It hides the visualization from other users.
  • It generates a new copy of the visualization.
  • It saves the visualization.
  • It gives access to Tableau’s options.
  1. Fill in the blank: In Tableau, a _____ palette displays two value ranges. Color shows the range where a data point is from and color intensity shows its magnitude.
  • diverging
  • overlaying
  • inverting
  • contrasting
  1. What could a data analyst do with the Lasso tool in Tableau?
  • Zoom in on a data point
  • Move a data point
  • Select a data point
  • Pan across data points
  1. Fill in the blank: In Tableau public, the _____ icon will hide your visualization from other users.
  • close
  • eye
  • source
  • private
  1. Fill in the blank: In Tableau, a diverging palette displays two value ranges. It uses a color to show the range where a data point is from and _____ to show its magnitude.
  • markers
  • borders
  • intensity
  • color overlays
  1. A data analyst creates a visualization in Tableau that allows the audience to change what data they want to see. What is such a visualization called?
  • indefinable
  • static
  • interactive
  • combo
  1. A data analyst creates a visualization with lots of contrast so that it is accessible for people with color vision deficiencies. What tool in Tableau does this?
  • Color tool
  • Contrast tool
  • Pan tool
  • Lasso tool
  1. You are working with the World Happiness data in Tableau. You use the pan tool on the country of Japan. What is the result?
  • It changes your point of view to Japan.
  • It selects Japan.
  • It filters Japan so it cannot be seen.
  • It applies the current color scheme to Japan.
  1. Fill in the blank: A data analyst is working with the World Happiness data in Tableau. They use the _____ tool on the Marks shelf to display the population of each country on the map.
  • size
  • detail
  • label
  • tooltip
  1. What tool could you use in Tableau to show only those countries with a World Happiness score of 4.0 or less?
  • Attribute
  • Format
  • Filter
  • Dimension
  1. Fill in the blank: In Tableau, a(n) _____ visualization is one in which the audience can change what data they see.
  • static
  • interactive
  • combo
  • indefinable
  1. Fill in the blank: You are creating a visualization with the World Happiness data from Tableau. With the Label tool, you can display the _____ of a specific attribute for each country on the visualization
  • color
  • location
  • truth
  • value
  1. A data analyst creates a visualization in Tableau showing their company’s quarterly sales data. They color all the items that have made a profit green and those in which they have a loss red. In addition, they intensify the color based on the magnitude of the profit or loss. What tool are they using?
  • Diverging palette
    • Value overlay
  • Conditional formatting
  • Color attribute

Week 3 – Crafting data stories

You are preparing to communicate to an audience about an analysis project. You consider the roles that your audience members play and their stake in the project. What aspect of data storytelling does this scenario describe?

  • Theme
  • Engagement
  • Takeaways
  • Discussion
  1. A data analyst wants to communicate to others about their analysis. They ensure the communication has a beginning, a middle, and an end. Then, they confirm that it clearly explains important insights from their analysis. What aspect of data storytelling does this scenario describe?
  • Spotlighting
  • Takeaways
  • Narrative
  • Setting
  1. A data analyst prepares to communicate to an audience about an analysis project. They consider what the audience members hope to do with the data insights. This describes establishing the setting.
  • True
  • False
  1. When designing a dashboard, how can data analysts ensure that charts and graphs are most effective? Select all that apply.
  • Incorporate all of the data points from the analysis
  • Make good use of available space
  • Place them in a balanced layout
  • Include as many visual elements as possible
  1. What are the key differences between tiled and floating items in Tableau?
  • Tiled items create a single-layer grid that contains no overlapping elements; floating items can be layered over other objects.
  • Tiled items are connected by straight lines; floating items are unconnected.
  • Tiled items always have a square layout; floating items are always based on circles.
  • Tiled items can be layered over other objects; floating items create a single-layer grid that contains no overlapping elements.
  1. A data analyst creates a scatter plot in Tableau and notices an outlier. What should they do next?
  • Use a filter to highlight the outlier, as it is more important than the rest of the data
  • Investigate the outlier to determine if it can lead to any important observations
  • Shift the outlier to the center of the other data points for conformity
  • Remove the outlier, as it is unlikely to lead to any important observations
  1. You are creating a dashboard in Tableau to share with stakeholders. Why might you decide to pre-filter the dashboard? Select all that apply.
  • To eliminate data points that do not support your conclusions
  • To save stakeholders the effort of filtering the dashboard themselves
  • To save stakeholders time in finding important data
  • To direct stakeholders to important data
  1. Fill in the blank: A data analyst is creating the title slide in a presentation. The data they are sharing is likely to change over time, so they include the _____ on the title slide. This adds important context.
  • key findings of the presentation
  • name of the data source
  • data analysts involved in the project
  • date of the presentation
  1. A data analyst wants to include a visual in their slideshow, then make some changes to it. Which of the following options will enable the analyst to edit the visual within the presentation without affecting its original file? Select all that apply.
  • Connect the original visual to the presentation via its URL
  • Copy and paste the visual into the presentation
  • Embed the visual into the presentation
  • Link the original visual within the presentation

Shuffle Q/A

  1. Fill in the blank: A data-storytelling narrative draws a connection between the data and the specific _____ of the project.
  • stakeholders
  • tasks
  • objectives
  • managers
  1. A data analyst scans the data to quickly identify the most important insights. This describes spotlighting.
  • True
  • False
  1. Fill in the blank: An important part of dashboard design is the placement of charts, graphs, and other visual elements. They should be _____, which means that they are balanced and make good use of available space.
  • constant
  • complete
  • clean
  • cohesive
  1. Fill in the blank: In Tableau, _____ items create a single-layer grid that contains no overlapping elements.
  • fixed
  • layered
  • tiled
  • floating
  1. While preparing a presentation, you decide to limit the number of lines and words on each slide. This will help keep your audience attentive to what you are saying rather than focusing on reading slides. What is the greatest number of lines and words you should use on each slide?
  • 2 lines and 15 words
  • 10 lines and 100 words
  • 5 lines and 25 words
  • 3 lines and 10 words
  1. You are creating a slideshow for a client presentation. There is a pivot table in a spreadsheet that you want to include. In order for the pivot table to update whenever the spreadsheet source file changes, how should you incorporate it into your slideshow? Select all that apply.
  • Copy and paste the pivot table
  • Insert a PDF of the pivot table
  • Link the pivot table
  • Embed the pivot table
  1. A data analyst wants to tell a story with data. As a second step, they focus on showing the story of the data to highlight the meaning behind the numbers. Which step of data storytelling does this describe?
  • Assemble word clouds
  • Create compelling visuals
  • Engage your audience
  • Tell an interesting narrative
  1. Which of the following questions do data analysts ask to make sure they will engage their audience? Select all that apply.
  • What information will convince the audience that my opinion is correct?
  • What roles do the people in this audience play?
  • What does the audience hope to do with the data insights?
  • What is the audience’s stake in the project?
  1. A data analyst links their visualizations to external spreadsheets containing the data being described. What is the purpose for doing this?
  • It allows changes made to the spreadsheet data that will not change the visualizations.
  • It allows for the creation of multiple visualizations using the same dataset.
  • It allows the visualization to be edited without the spreadsheet data being affected.
  • It allows changes made to the spreadsheet data to automatically reflect in the visualizations.
  1. What three key components are required in a data storytelling narrative?
  • Stakeholders, analysts, and customers
  • Spotlighting, setting, and takeaways
  • Measurement, data, and analysis
  • Beginning, middle, and end
  1. You are designing a dashboard in Tableau. You choose a layout that allows objects to be layered over other items in the dashboard. What type of layout is this?
  • Tiled
  • Vertical
  • Horizontal
  • Floating
  1. On a scatterplot, what is the term for a point that lies far from the rest of the points?
  • An error
  • A filter
  • An outlier
  • An anomaly
  1. A data analyst wants to save stakeholders time and effort when working with a Tableau dashboard. They also want to direct stakeholders to the most important data. What process can they use to achieve both goals?
  • Pre-filtering
  • Pre-sorting
  • Pre-sizing
  • Pre-building
  1. Fill in the blank: An effective slideshow guides your audience through your main communication points, but it does not repeat every word you say. A best practice is to keep text to fewer than _____ lines and 25 words per slide.
  • 2
  • 10
  • 5
  • 15
  1. A data analyst has multiple points to show with the same visualization. What should they do to communicate these points effectively to their audience?
  • Save some of the points to use in another presentation
  • Create a new visualization for each point they need to make
  • Limit the number of points to only a few that are the most relevant
  • Identify each point on the same visualization using arrows
  1. A data analyst wants to tell a story with data. As a first step, they consider who will be listening to the data story and focus on capturing and holding their audience's interest. Which step of data storytelling does this describe?
  • Assemble word clouds
  • Tell an interesting narrative
  • Engage your audience
  • Create compelling visuals
  1. You are preparing to communicate to an audience about an analysis project. You know that audience engagement is a crucial part of getting them to listen to what you have to say. You compile a list of the insights from your work and review it to identify both the key takeaways and the details that are less relevant. What process does this describe?
  • Narrative
  • Spotlighting
  • Discussion
  • Takeaways
  1. A data analyst is designing a dashboard. They make sure that the charts, graphs, and other visual elements are balanced and make good use of available space. What dashboard best practice does this describe?
  • Detail
  • Completeness
  • Cohesion
  • Labeling
  1. You are sharing your Tableau dashboard with stakeholders. What process can you implement so the stakeholders do not need to filter the dashboard themselves?
  • Pre-sizing
  • Pre-filtering
  • Pre-filtering
  • Pre-building
  1. You want to include a visual in your slideshow that will update automatically when its original source file updates. Which of the following actions will enable you to do so?
  • Copy and paste the visual into the presentation
  • Take a screenshot of the visual and paste it into the presentation
  • Link the original visual within the presentation
  • Embed the visual into the presentation
  1. An analyst is designing a dashboard. In order for it to be effective, they make sure that the charts, graphs, and other visual elements are balanced. What else should they do to make the dashboard design cohesive?
  • Fill it with color.
  • Make good use of space.
  • Put in lots of detail.
  • Make sure the dashboard is complete.
  1. Fill in the blank: When a data analyst notices a data point that is very different from the norm in a scatterplot, the best course of action is to _____ the outlier.
  • investigate
  • move
  • remove
  • hide
  1. You are working on a huge dataset and visualizing your data with Tableau. As a next step, you want to focus on only the data that is most important. Which Tableau tool can you use to limit the data displayed on the dashboard?
  • Pre-filtering
  • Pre-building
  • Pre-sorting
  • Pre-sizing
  1. Fill in the blank: An effective slideshow guides your audience through your main communication points, but it does not repeat every word you say. A best practice is to keep text to fewer than five lines and _____ words per slide.
  • 50
  • 5
  • 100
  • 25
  1. A data analyst embeds their visualizations in their slideshow. These visualizations are based on data contained in external spreadsheets. Why might the analyst do this rather than copy and pasting the visualization?
  • Subsequent changes made to the spreadsheet data will automatically be reflected in the slideshow.
  • The visualizations will remain with the spreadsheet file instead of the presentation.
  • Subsequent changes made to the spreadsheet data will not affect the visualization.
  • The visualizations can be edited directly in the slideshow.

Week 4 – Developing presentations and slideshows

You are presenting your theory about the correlation between recent sales increases and a current pop culture trend. When is the best time to establish your presentation’s hypothesis for the audience?

  • During the introduction
  • Before the conclusion
  • During the conclusion
  • Before the presentation
  1. A data analyst gives a presentation about predicting upcoming investment opportunities. How does establishing a hypothesis help the audience understand their predictions?
  • It describes the data thoroughly
  • It summarizes the findings succinctly
  • It visualizes the data clearly and concisely
  • It provides context about the presentation’s purpose
  1. According to the McCandless Method, what is the most effective way to first present a data visualization to an audience?
  • Answer obvious questions before they’re asked
  • State the insight of the graphic
  • Tell the audience why the graphic matters
  • Introduce the graphic by name
  1. You are preparing for your first presentation at a new job. Which strategies can help you combat nervousness about presentations? Select all that apply.
  • Improvise your material to speak naturally
  • Practice and prepare your material
  • Do breathing exercises to calm your body down
  • Channel your nervousness into excitement about your topic
  1. You are preparing for a presentation and want to make sure your nerves don’t distract you from your presentation. Which practices can help you stay focused on an audience? Select three that apply.
  • Speak as quickly and briefly as possible
  • Use short sentences
  • Keep the pitch of your voice level
  • Be mindful of nervous habits
  1. You are running a colleague test with your coworkers. One coworker points out that she doesn’t understand one of your graphs. What can you do to prepare for presenting to your stakeholders? Select all that apply.
  • Redesign the graph
  • Elaborate on the data from the graph
  • Move the graph to a later slide
  • Remove the graph
  1. Your stakeholders express concern that the results of your analysis are very different from the predictions they made last year. Which kind of objection are they making?
  • Data
  • Analysis
  • Presentation skills
  • Findings
  1. A stakeholder objects to the steps of your analysis. What are some appropriate ways to respond to this objection? Select all that apply.
  • Explain why you think any discrepancies exist
  • Take steps to investigate your analysis question further
  • Communicate the assumptions you made in your analysis
  • Defend the results of your analysis
  1. You notice that your audience is not as engaged as you’d like during your Q&A. Which of the following are ways to get them more involved?
  • Keep your pitch level
  • Repeat your key findings
  • Wait longer for the audience to ask questions
  • Ask them for insights

Shuffle Q/A

  1. A purchaser at your company wants to optimize the price they will pay to order office supplies for the coming year. Which of the following is a good initial hypothesis to test in order to help the purchaser optimize their spending? Select all that apply.
  • Office supply prices increase seasonally.
  • Office supply prices remain the same throughout the year.
  • The budget for office supplies should increase.
  • The budget for office supplies can remain the same.
  1. According to the McCandless method, when should you present the data that supports insights?
  • After stating insights
    • Before stating insights
    • At the end of the presentation
    • At the beginning of the presentation
  • While stating insights
  1. An analyst introduces a graph to their audience to explain an analysis they performed. Which strategy would allow the audience to absorb the data visualizations? Select all that apply.
  • Practicing breathing exercises
  • Improving body language
  • Using the five second rule
  • Starting with broad ideas
  1. During a presentation, one of your stakeholders expresses concern that you did not control for differences in the data. Which kind of objection are they making?
  • Findings
    • Presentation Skills
  • Data
  • Analysis
  1. During a meeting, a colleague on your team points out a flaw in your analysis that you had not noticed before. What steps should you take to respond to their objection? Select all that apply.
  • Hide evidence that you were incorrect
  • Follow up with your colleague
  • Investigate the issue
  • Acknowledge that their objection is valid
  1. You are presenting to a large audience and want to keep everyone engaged during your Q&A. What can you do to ensure your audience doesn’t grow disinterested despite its size?
  • Ask your audience for insights
  • Wait longer for the audience to ask questions
  • Repeat your key findings
  • Keep your pitch level
  1. Which of the following statements is true about using a hypothesis in your data presentation?
  • Include the hypothesis in a summary at the end of your presentation
  • Choose a hypothesis your audience will like
  • Include a new hypothesis before every data visualization
  • Present the hypothesis early in your presentation
  1. Why is it important to state the insights from your graphic when using the McCandless method?
  • To get everyone on the same page before you give supporting details
  • To make sure your audience understands why the data matters
  • To ensure that you establish credibility as a serious data analyst
  • To add a strong finish to your presentation
  1. A researcher is presenting the data for their study. What can they do to ensure their presentation is impactful?
  • Ensure their delivery is as well executed as their analysis
  • Suppress their excitement to remain passive and neutral
  • Start with really narrow ideas and works towards broad ideas
  • Focus on the data instead of focusing on presentation skills
  1. You run a colleague test on your presentation before getting in front of an audience. Your coworker asks a question about a section of your analysis, but addressing their concern would mean adding information you didn’t plan to include. How should you proceed with building your presentation? Select all that apply.
  • Leave the presentation as-is
  • Keep the concern in mind and anticipate that stakeholders may ask the same question
  • Remove the section of the analysis that prompted the question
  • Expand your presentation by including the information
  1. One of your stakeholders tried to reproduce the work you presented by using a copy of your scripts and was unable to get the same results. Which kind of objection are they making?
  • Data
  • Analysis
  • Presentation skills
  • Findings
  1. One of your co-workers is giving a presentation on the results of an analysis the two of you have been working on. Someone in the audience points out that the data system you used has frequent errors. How should you deal with this comment?
  • Assume you were given valid data
  • Tell them they should have looked at the appendix
  • Explain how you cleaned and formatted the data
  • Ignore the question and move on
  1. Why should you repeat questions that you receive during your presentation? Select all that apply.
  • It helps you take up more time.
  • It gives you a moment to think.
  • It allows you to ensure you understood the question.
  • It ensures you focus on the person asking the question instead of the whole audience.
  1. You give a presentation on your latest data analysis and receive feedback from the audience that they did not understand the context of the analysis. What might have caused this problem? Select all that apply.
  • Your hypothesis was stated early.
  • Your hypothesis was not included.
  • Your hypothesis was a disprovable theory.
  • Your hypothesis was stated too late.
  1. According to the McCandless Method, what is the most effective way to finish presenting data to an audience?
  • Call out data to support your insights
  • Tell your audience why it matters
  • Answer any obvious questions before they’re asked
  • State the insight of your graphic
  1. You are putting together a list of your peers to run colleague tests with. What are some qualities of good peers to target?
  • They are very different from your audience
  • They are familiar with your previous work
  • They worked on the analysis with you
  • They have no prior knowledge of your work
  1. Your stakeholders are concerned that you inappropriately removed data during the initial phases of your project. Which kind of objection are they making?
  • Findings
  • Data
  • Presentation Skills
  • Analysis
  1. You are presenting to your stakeholders an analysis of your company’s latest quarter earnings. Your stakeholders express concern that your projections for next quarter are lower than expected. What are appropriate ways to respond to these objections? Select all that apply.
  • Explain why you think the discrepancies exist
  • Repeat the steps you took
  • Take steps to investigate your analysis question further
  • Communicate the assumptions you made in your approach
  1. After a presentation one of your peers points out that you were unable to answer audience questions very well. Which step can you take to improve your question answering?
  • Answer questions immediately with highly detailed answers
  • Start thinking of answers during the question
  • Focus your responses on people that ask questions instead of the whole audience
  • Repeat questions to ensure you understood
  1. What is the final step, or the “so what?” phase, to the McCandless Method? This is the point where you present the possible business impact of the solution and clear actions stakeholders can take?
  • State the insight of the graphic
  • Tell your audience why it matters
  • Call out data to support that insight
  • Answer obvious questions before they’re asked
  1. During a presentation, you stop and wait for five seconds after displaying a new graphic. According to the McCandless method, what should you do after that delay?
  • Ask if there are any questions
  • Wait another five seconds
  • Move on to the next topic
  • Return to the previous content
  1. You are giving a presentation to the leadership of a local community organization. How can you effectively communicate your findings to them?
  • Focus on what you found interesting
  • Focus on what the audience needs to hear
  • Focus on specific technical details of your analysis
  • Focus on speaking without any pauses
  1. You are on a team of analysts presenting to your stakeholders. Your teammate responds to an objection about your steps of analysis by repeating the steps and then getting defensive when the stakeholders don’t seem to understand. What could they have done to respond to the objection more appropriately? Select all that apply.
  • Promise to investigate your analysis question further
  • Remind the stakeholders of your successes
  • Acknowledge that the objection is valid
  • Describe the approach you took in your analysis
  1. You are getting ready to give the biggest presentation of your career. Which of the following methods might help you prepare to give the presentation? Select all that apply.
  • Write a script and repeat it in your head
  • Hold a dress rehearsal at the presentation location
  • Avoid thinking about the presentation
  • Visualize giving the presentation
  1. You are presenting to your stakeholders and want to convey confidence. How should your body language reflect your composure? Select all that apply.
  • Stand up straight and be still
  • Gesture enthusiastically to illustrate each point
  • Make eye contact with audience members
  • Pace as you speak to the audience
  1. As part of an internship you are giving a presentation of your work to the rest of the department. Why might you want to perform a colleague test? Select all that apply.
  • It helps you come up with highly detailed answers.
  • It can help find places your audience might get confused.
  • It can help you discover jargon to include.
  • It can help you discover jargon to include.
  1. You are introducing a data visualization during your presentation and are concerned that it may overwhelm your audience. How can you allow the audience to process the information when you first introduce the visualization?
  • Wait five seconds
  • Thoroughly explain the context
  • Describe each graph quickly
  • Define each parameter
  1. You are preparing to present in front of a large audience. Which of the following is a best practice for speaking to an audience?
  • Speak as quickly as possible
  • Take as few pauses as possible
  • Take long pauses between sentences
  • Speak at a relaxed pace in short sentences
  1. You are running a colleague test with your coworkers. One coworker points out that your data has limitations. What can you do to prepare to explain the limitations of your data? Select all that apply.
  • Consider the context
  • Critically analyze any correlations
  • Understand the strength and weaknesses of your tools
  • Be ready with industry jargon and acronyms

Course challenge

Next, you decide on your data narrative’s characters, setting, plot, big reveal, and aha moment. The characters are the people affected by your story. This includes your stakeholders, Gaea’s customers, and Gaea’s potential future customers. For the setting, you describe the current situation, potential tasks, and background information about the analysis project.

As you begin to work on the plot for the data narrative, which of the following ideas would you include? Select all that apply.

  • Why it’s important for Gaea to increase its cars’ battery range by 2025
  • How your data analysis can help Gaea solve its business problems
  • The challenges associated with the current lack of vehicle charging stations
  • A list of your recommendations and details about why they will help Gaea be successful

After creating data visualizations about the current state of the electric vehicle market, you turn to projections. You want to communicate to stakeholders about the importance of longer vehicle battery range to consumers.

Your team analyzes data from a consumer survey that investigated the importance of longer battery range when choosing whether to purchase an electric car. The current average battery range is about 210 miles. By 2025, that distance is expected to grow to 450 miles per charge.

You create the following pie chart:

After reviewing your pie chart, you realize that it could be improved. How do you make this chart more effective?

  • Write a longer title to add more detail about the data the pie chart contains
    • Remove the labels for the number of miles per charge consumers will require before purchasing an electric vehicle
  • Resize the pie segments so they visually show the different values
  • Add an x-axis and y-axis to provide additional explanation about the data

As a final step in the data-sharing process, you think about how to respond during the Q&A session. What strategies will you employ when answering questions? Select all that apply.

  • Involve your whole audience
  • Listen to the whole question, and repeat it, if necessary
  • Understand the context of the question
  • Provide detailed, comprehensive responses
  1. Scenario 1, questions 1-9

You have been working as a junior data analyst at Bowling Green Business Intelligence for nearly a year. Your supervisor, Kate, tells you that she believes you are ready for more responsibility. She asks you to lead an upcoming client presentation. You will be responsible for creating the data story, identifying the right tools to use, building the slideshow, and delivering the presentation to stakeholders.

Your client is Gaea, an automotive manufacturer that makes eco-friendly electric cars. For the past year, you have been working with the data team in Gaea’s Bowling Green, Kentucky, headquarters. For the presentation, you will engage the data team, as well as its regional sales representatives and distributors. Your presentation will inform their business strategy for the next three-to-five years.

You begin by getting together with your team to discuss the data story you want to tell. You know the first step in data storytelling is to engage your audience.

Fill in the blank: A big part of engagement is knowing how to eliminate less important details. So, you use spotlighting to _____ the data in order to identify the most important insights.

  • recheck
  • scan
  • study
  • research
  1. Scenario 1, continued

After you identify the most important insights, it’s time to create your primary message. Your team’s analysis has revealed three key insights:

Electric vehicle sales demand is expected to grow by more than 400% by 2025.

The number of publicly available vehicle charging stations is a significant factor in consumer buying decisions. Currently, there are many locations with so few charging stations that electric car owners would run out of power when traveling between stations.

Vehicle battery range is also a significant factor for consumers. In 2020, the average battery range was 210 miles. However, the vast majority of survey respondents report they will not buy an electric car until the battery range is at least 300 miles per charge.

Based on these insights, you create your primary message. Which of the following reflect the expectations of a primary message?

  • The number of publicly available vehicle charging stations is a significant factor in consumer buying decisions. Therefore, Gaea must begin building vehicle charging stations
  • Although electric vehicle sales demand is on the rise, low availability of charging stations and short battery range are significant hurdles that Gaea must overcome
  • Electric vehicle sales demand is expected to grow by more than 400% by 2025. However, the number of publicly available vehicle charging stations is a significant factor in consumer buying decisions. Currently, there are many locations with so few charging stations that electric car owners would run out of power when traveling between stations. Vehicle battery range is also a significant factor for consumers. In 2020, the average battery range was 210 miles. However, the vast majority of people say they will not buy an electric car until the battery range is at least 300 miles per charge
  • Electric vehicle demand is skyrocketing
  1. Scenario 1, continued

Next, you decide on your data narrative’s characters, setting, plot, big reveal, and aha moment. During the narrative, you want to communicate to your stakeholders about the challenges associated with the current lack of vehicle charging stations and why it's important for Gaea to increase its cars’ battery range by 2025.

Information about charging stations and the need to increase battery range will be part of the setting of your data story.

  • True
  • False
  1. Scenario 1, continued

Now, it’s time to consider which tools to use to create data visualizations that will clearly communicate the results of your analysis. You and your team decide to make both spreadsheet charts and Tableau data visualizations. In addition, you want to provide them with a tool that will achieve the following goals:

Organize multiple datasets about electric vehicle battery ranges into a central location

Enable tracking and analysis of electric vehicle data

Simplify data visualizations about the number of available charging stations using maps of the different geographies

What tool do you create for your stakeholders?

  • Dashboard
  • Spreadsheet
  • Database
  • Algorithm
  1. Now that you have finished planning the data story with your team, it’s time to create data visualizations. First, you consider electric vehicle sales worldwide in 2015 compared to 2020. You use a spreadsheet to create the following bar graph to compare the two values:

You want to add a label to represent the scale (total count by year) of electric vehicle sales. Where on the graph do you label these values?

  • The colors
  • The vertical bars
  • The y-axis
  • The x-axis
  1. Next, you explore how access to public car-charging stations is influencing electric vehicle purchases. As your analysis has revealed, there are many areas without enough places for people to plug in and charge their cars. This lack of charging stations has a negative impact on demand for electric cars and overall vehicle sales.

You use Tableau to create the following draft of a visualization, which organizes the charging station data geographically:

After reviewing your draft, you realize that it could be improved.

Fill in the blank: To improve your draft, you select more varied hues and make the color intensity stronger. In addition, you choose darker _____ in order to reflect more light.

  • views
  • values
  • visuals
  • variables
  1. Scenario 1, continued

Now, you want to highlight what your team’s analysis discovered about the number of charging stations available compared to the number of cars purchased. Your data has confirmed that the lack of charging stations causes the effect of fewer car sales. To communicate this effectively, you will need to convey causation to the stakeholders.

You explain that causation is the measure of the degree to which two variables move in relationship to each other. In the case of Gaea’s business, charging station numbers and car sales move in the same direction.

  • True
  • False
  1. Scenario 1, continued

Once you finish creating data visualizations about the current state of the electric vehicle market, you turn to projections for the future. You want to communicate to stakeholders about the importance of longer vehicle battery range to consumers.

Your team’s data includes feedback from a consumer survey that investigated the importance of longer battery when choosing whether to purchase an electric car. The current average battery range is about 210 miles. By 2025, that distance is expected to grow to 450 miles per charge.

You create the following pie chart:

Fill in the blank: After reviewing your pie chart, you realize that it could be improved. You resize the _____ so they visually show the different values.

  • labels
  • axes
  • segments
  • values
  1. Scenario 1, continued

It’s time to build your Tableau dashboard for stakeholders. You consider what type of layout to use.

You decide that you want to be able to adjust the width of the views and the data visualizations about electric vehicle sales, charging stations, and battery range. Which type of layout will enable you to do that?

  • Vertical layout
    • Circular layout
    • Diagonal layout
  • Horizontal layout
  1. Scenario 2, questions 10-15

You have created your narrative and visuals, so now it’s time to build a professional and appealing slideshow. You choose a theme that matches the tone of your presentation. Then, you create a title slide with a title, subtitle, and the date.

Next, you create the following slide that compares electric vehicle sales in 2015 and 2020:

After reviewing your slide, you realize that it could be improved. What steps do you take to make the two text boxes beneath the header more effective? Select all that apply.Add Your Heading Text Here

  • Edit the text to fewer than five lines total
  • Ensure the text does not simply repeat the words you plan to say
  • Use abbreviations to reduce the amount of text
  • Edit the text to fewer than 25 words total
  1. Scenario 2, continued

You then create the following slide to demonstrate the challenges associated with battery range and charging stations:

After reviewing your slide, you realize that the visual elements could be improved. A good solution would be for you to choose one data visualization to share on this slide, then create another slide for the second data visualization.

  • True
  • False
  1. Scenario 2, continued

You complete your slideshow and share it with your team. Once it is approved by your supervisor, you begin preparing to give your presentation. You consider maintaining good posture, being aware of nervous habits, and making eye contact. In addition, you think about how you will explain the data visualizations.

One of the strategies you practice is the five-second rule. What are some key aspects of this rule? Select all that apply.

  • Ask your audience if they understand the data visualization
  • Be prepared to explain the data visualization
  • Tell your audience the conclusion that you want them to understand
  • Take no more than five seconds to explain the data visualization
  1. Scenario 2, continued

Next, you prepare for the question-and-answer session that will follow your presentation. To predict what questions they may ask, you do a colleague test of your presentation. You should choose a colleague who has deep expertise in the electric vehicle industry.

  • True
  • False
  1. Scenario 2, continued

Now that you have some idea of the questions the stakeholders will ask, you and a team member consider different objections that might arise.

Your team member asks you how you will respond if someone from Gaea questions your data-cleaning process. How do you prepare for this objection? Select all that apply.

  • Keep a detailed log of your data-cleaning process
  • Practice answering questions about your data-cleaning process
  • Add your data-cleaning log to the slideshow appendix
  • Be prepared to explain why data cleaning is not relevant at this stage of the project
  1. Scenario 2, continued

The big day has arrived, and you have just finished giving your presentation to the Gaea team. It’s now time for the question-and-answer session, and a stakeholder asks you a very detailed question about one specific electric vehicle charging station initiative.

You listen to the whole question, then repeat it. For what reasons is this important? Select all that apply.

  • It ensures the entire audience has heard the question, in case they did not when it was originally asked
  • It enables you to rephrase it in a way that is easier to answer
  • It helps you confirm that you understand the question
  • It gives the stakeholder a chance to correct you if you misunderstand

Shuffle Q/A

  1. Scenario 1, questions 1-9

You have been working as a junior data analyst at Bowling Green Business Intelligence for nearly a year. Your supervisor, Kate, tells you that she believes you are ready for more responsibility. She asks you to lead an upcoming client presentation. You will be responsible for creating the data story, identifying the right tools to use, building the slideshow, and delivering the presentation to stakeholders.

Your client is Gaea, an automotive manufacturer that makes eco-friendly electric cars. For the past year, you have been working with the data team in Gaea’s Bowling Green, Kentucky, headquarters. For the presentation, you will engage the data team, as well as its regional sales representatives and distributors. Your presentation will inform their business strategy for the next three-to-five years.

You begin by getting together with your team to discuss the data story you want to tell. You know the first step in data storytelling is to engage your audience.

You use spotlighting to help you identify the most important insights. Which of the following activities are involved with spotlighting? Select all that apply.

  • Determining the data’s partiality
  • Identifying connections or patterns
  • Finding ideas or concepts that keep arising
  • Noticing repeated words or numbers
  1. Scenario 1, continued

Once you have identified the most important insights, it’s time to create your primary message. Your team’s analysis has revealed three key insights:

Electric vehicle sales demand is expected to grow by more than 400% by 2025.

The number of publicly available vehicle charging stations is a significant factor in consumer buying decisions. Currently, there are many locations with so few charging stations that electric car owners would run out of power when traveling between stations.

Vehicle battery range is also a significant factor for consumers. In 2020, the average battery range was 210 miles. However, the vast majority of survey respondents report they will not buy an electric car until the battery range is at least 300 miles per charge.

Based on these insights, you create your primary message. What are the expectations of a primary message? Select all that apply.

  • Clear
  • Direct
  • Comprehensive
  • Subtle
  1. Scenario 1, continued

Next, you decide on your data narrative’s characters, setting, plot, big reveal, and aha moment. During the narrative, you want to communicate to your stakeholders about the challenges associated with the current lack of vehicle charging stations and why it's important for Gaea to increase its cars’ battery range by 2025.

In which part of your data narrative would you include information about charging stations, the need to increase battery range, and why it’s important for Gaea to increase its cars’ battery range?

  • Aha moment
  • Setting
  • Plot
  • Big reveal
  1. Scenario 1, continued

It’s time to build your Tableau dashboard for stakeholders. You consider what type of layout to use.

Describe the differences between vertical and horizontal layouts. Select all that apply.

  • Vertical layouts prevent items from being layered over other objects
  • Vertical layouts adjust the height of the views and objects contained
  • Horizontal layouts adjust the width of the views and objects contained
  • Horizontal layouts prevent items from being layered over other objects
  1. Scenario 2, questions 10-15

You have created your narrative and visuals, so now it’s time to build a professional and appealing slideshow. You choose a theme that matches the tone of your presentation. Then, you create a title slide with a title, subtitle, and the date.

Next, you create the following slide to communicate information about electric vehicle sales in 2015 compared to 2020:

Alt-text: Slideshow with bar chart of electric vehicle sales from 2015 and 2022. 2022 had higher sales. There are also multiple sentences at the bottom of the slide and another piece of descriptive text near the chart.

To improve the slide, you remove the text box at the bottom. For what reasons will this make your slide more effective? Select all that apply.

  • Slide text should be fewer than 25 words total
  • The text shouldn’t simply repeat the words you say
  • The font size is too small for your audience to read
  • Slide text should be no more than 10 lines total
  1. Scenario 2, continued

You complete your slideshow and share it with your team. Once it is approved by your supervisor, you begin preparing to give your presentation. You consider maintaining good posture, being aware of nervous habits, and making eye contact. In addition, you think about how you will speak.

What strategies can help you speak effectively? Select all that apply.

  • Building in intentional pauses to give your audience time to think about what you have just said
  • Speaking quickly so you are sure to have time to include all important data points
  • Using short words and sentences
  • Keeping the pitch of your sentences level so that your statements are not confused for questions
  1. Scenario 2, continued

Next, you prepare for the question-and-answer session that will follow your presentation. What methods help you consider any limitations of your data? Select all that apply.

  • Understand the strengths and weaknesses of the tools
  • Eliminate the outliers
  • Look at the context
  • Critically analyze the correlations
  1. Scenario 2, continued

The big day has arrived, and you finish your presentation to the Gaea team. In the question-and-answer session, a stakeholder asks you a very detailed question about a car battery range project that's still in development.

What strategies do you use in order to respond effectively? Select all that apply.

  • Be certain that you understand the context of the question that the stakeholder is asking
  • Involve the whole audience when you respond to the stakeholder
  • Keep your response short and to the point, then add detail if there are follow-up questions
  • Give yourself extra time by planning your thoughtful response when the stakeholder begins speaking
  1. Scenario 1, continued

Your team’s analysis has revealed three key insights:

Electric vehicle sales demand is expected to grow by more than 400% by 2025.

The number of publicly available vehicle charging stations is a significant factor in consumer buying decisions. Currently, there are many locations with so few charging stations that electric car owners would run out of power when traveling between stations.

Vehicle battery range is also a significant factor for consumers. In 2020, the average battery range was 210 miles. However, the vast majority of survey respondents report they will not buy an electric car until the battery range is at least 300 miles per charge.

Fill in the blank: Based on these insights, you create a clear and direct _____, which will guide your data story.

  • business case
  • spotlight
  • primary message
  • specific question
  1. Scenario 1, continued

Now, it’s time to consider which tools to use to create data visualizations that will clearly communicate the results of your analysis. You and your team decide to make both spreadsheet charts and Tableau data visualizations. In addition, you agree to build a dashboard to share live, incoming data with your stakeholders. This will help them achieve the following goals:

Organize multiple datasets about electric vehicle battery ranges into a central location

Enable tracking and analysis of electric vehicle data

Simplify data visualizations about the number of available charging stations using maps of the different geographies

Another key benefit of dashboards is that they enable you to maintain control of your data narrative.

  • True
  • False
  1. Next, you explore how access to public car-charging stations is influencing electric vehicle purchases. As your analysis has revealed, there are many areas without enough places for people to plug in and charge their cars. This lack of charging stations has a negative impact on demand for electric cars and overall vehicle sales.

You use Tableau to create the following draft of a visualization, which organizes the charging station data geographically:

After reviewing your draft, you realize that it could be improved. What steps do you take to make your map more effective? Select all that apply.

  • Select more varied hues
  • Add more space between each state
  • Make the intensity of the colors stronger
  • Choose darker values
  1. Scenario 1, continued

Now, you want to highlight what your team’s analysis discovered about the number of charging stations available compared to the number of cars purchased. Your data has confirmed that the lack of charging stations causes the effect of fewer car sales. To communicate this effectively, you will need to convey causation to the stakeholders.

How do you explain causation?

  • Causation involves how often data values fall into certain ranges. In the case of Gaea’s business, data about the number of charging stations will fall into ranges associated with car sales.
  • Causation is the measure of the degree to which two variables move in relationship to each other. In the case of Gaea’s business, charging station numbers and car sales move in the same direction.
  • Causation involves everything associated with an event. In the case of Gaea’s business, the lack of charging stations has a negative effect on the entire automotive marketplace.
  • Causation is when an action directly leads to an outcome, such as a cause-effect relationship. In the case of Gaea’s business, the lack of charging stations directly leads to the outcome of fewer car sales.
  1. Scenario 2, continued

You then create the following slide to demonstrate the challenges associated with battery range and charging stations:

After reviewing your slide, you realize that the visual elements could be improved. You do this by first choosing one data visualization to share on this slide, then create another slide for the second data visualization.

Fill in the blank: In addition, you make sure to use _____ font sizes and colors for all of your data visualization titles.

  • consistent
  • unique
  • colorful
  • different
  1. Scenario 2, continued

Now that you have some idea of the questions the stakeholders will ask, you consider potential objections. You and a team member consider different objections that might arise. Your team member asks you how you will respond if someone from Gaea has an objection that you haven’t prepared for.

You say that you will respond professionally using the information you currently have available in order to move quickly past the objection.

  • True
  • False
  1. Scenario 1, questions 1-9

You have been working as a junior data analyst at Bowling Green Business Intelligence for nearly a year. Your supervisor, Kate, tells you that she believes you are ready for more responsibility. She asks you to lead an upcoming client presentation. You will be responsible for creating the data story, identifying the right tools to use, building the slideshow, and delivering the presentation to stakeholders.

Your client is Gaea, an automotive manufacturer that makes eco-friendly electric cars. For the past year, you have been working with the data team in Gaea’s Bowling Green, Kentucky, headquarters. For the presentation, you will engage the data team, as well as its regional sales representatives and distributors. Your presentation will inform their business strategy for the next three-to-five years.

You begin by getting together with your team to discuss the data story you want to tell. You know the first step in data storytelling is to engage your audience.

A big part of audience engagement is knowing how to eliminate less important details. What practice do you use to scan quickly through the data in order to identify the most important insights?

  • Balancing
  • Ranking
  • Filtering
  • Spotlighting
  1. Scenario 1, continued

Now that you have finished planning the data story with your team, it’s time to create data visualizations. First, you consider electric vehicle sales worldwide in 2015 compared to 2020. You use a spreadsheet to create the following bar graph to compare the two values:

You add information on the x-axis to represent a scale of values for the total electric vehicle sales and on the y-axis to represent the time periods (2015 and 2020).

  • False
  • True
  1. Scenario 2, continued

You then create the following slide to demonstrate the challenges associated with battery range and charging stations:

After reviewing your slide, you realize that the visual elements could be improved. Which of the following options would help you make the visual elements on this slide more effective? Select all that apply.

  • Use more colors in the map
  • Provide a detailed written explanation of both data visualizations
  • Choose one data visualization to share on this slide, then create another slide for the second data visualization
  • Use a consistent font size and color for data visualization titles
  1. Scenario 2, continued

You complete your slideshow and share it with your team. Once it is approved by your supervisor, you prepare to give your presentation. You consider presentation best practices: maintaining good posture, being aware of nervous habits, and making eye contact. In addition, you think about how you will present your data visualizations.

What strategies can help you explain the data visualizations effectively? Select all that apply.

  • Channel your excitement
  • Start with the broader ideas
  • Use the five-second rule
  • Speak quickly to save time and cover all important data points

Course 7 – Data Analysis with R Programming

Week 1 – Programming and data analytics

What are the benefits of using a programming language for data analysis? Select all that apply.

  • It is faster to clean data.
  • It is easy to share code.
  • It does not require data cleaning
  • It does not require specific syntax.

What process does a data analyst use to instruct a computer to perform sets of actions?

  • Analytics
  • Programming
  • Filtering
  • Visualization

A team of data analysts is working on a complex analysis. The team needs to quickly process lots of data. They also need to easily reproduce and share every step of their analysis. What should they use for the analysis?

  • A dashboard
    • A database
    • Structured query language
  • The R programming language

What is a type of application that brings together all the tools a data analyst may want to use in a single place?

  • Spreadsheet
    • Integrated development environment
  • Database
  • Dashboard

Which of the following statements about RStudio’s integrated development environment are correct? Select all that apply.

  • RStudio only works on Windows.
  • RStudio panes are customizable.
  • RStudio includes a built-in console.
  • RStudio is closed-source.

A data analyst writes the code summary(penguins) in order to display a summary of the penguins dataset. Where in RStudio can the analyst execute the code? Select all that apply.

  • R console pane
  • Source editor pane
  • Files tab
  • Environment pane

1.A data analyst uses words and symbols to give instructions to a computer. What are the words and symbols known as?

  • Coded language
  • Function language
  • Programming languages
  • Syntax languages
  1. Many data analysts prefer to use a programming language for which of the following reasons? Select all that apply.
  • To save time
  • To clarify the steps of an analysis
  • To easily reproduce and share an analysis
  • To choose a topic for analysis
  1. Fill in the blank: _____ code is freely available and may be modified and shared by the people who use it.
  • Open-ended
  • Open-source
  • Open-access
  • Open-syntax
  1. Which of the following are benefits of using R for data analysis? Select all that apply.
  • Create high-quality data visualizations
  • Define a problem and ask the right questions
  • Process lots of data
  • Reproduce and share an analysis

5.Fill in the blank: A data analyst wants to quickly create visualizations and then share them with a teammate. They can use _____ for the analysis.

  • the R programming language
  • a dashboard
  • structured query language
  • a database
  1. RStudio’s integrated development environment includes which of the following? Select all that apply.
  • A console for executing commands
  • An area to manage loaded data
  • A viewer for playing videos
  • An editor for writing code
  1. Fill in the blank: When you execute code in the source editor, the code automatically also appears in the _____.
  • R console
  • plots tab
  • environment pane
  • files tab
  1. A data analyst is working with spreadsheet data. The analyst imports the data from the spreadsheet into RStudio. Where in RStudio can the analyst find the imported data?
  • Source editor pane
  • Environment pane
  • R console pane
  • Plots tab

Shuffle Q/A

  1. Fill in the blank: _____ are the words and symbols you use to write instructions for computers.
  • Code languages
  • Programming languages
  • Syntax languages
  • Variable languages
  1. A data analyst wants to use a programming language that they can modify. What type of programming language should they use?
  • Console-based
  • Data-centric
  • Community-oriented
  • Open-source
  1. A data analyst needs to quickly create a series of scatterplots to visualize a very large dataset. What should they use for the analysis?
  • A dashboard
  • The R programming language
  • A slide presentation
  • Structured query language
  1. What type of software is RStudio?
  • Integrated development environment
  • Programming language
  • Syntax
  • Pane
  1. A data analyst wants to write R code where they can access it again after they close their current session in RStudio. Where should they write their code?
  • R console
  • Files tab
  • History tab
  • Source editor
  1. What are the benefits of using a programming language for data analysis? Select all that apply.
  • They store steps of your analysis for future use.
  • They have no specific syntax.
  • They save time cleaning data.
  • It does not require data cleaning
  1. Which of the following statements about the R programming language are correct? Select all that apply.
  • It can create world-class visualizations
  • It makes analysts spend more time cleaning data and less time analyzing
  • It can process large amounts of data
  • It relies on spreadsheet interfaces to clean and manipulate data
  1. A data analyst is searching for a tool that gives them the most power to customize the visualizations they use in their analysis. What tool should they use?
  • The R Programming language
  • Tableau
  • Spreadsheets
  • SQL
  1. Which of the following statements about RStudio’s integrated development environment are correct? Select all that apply.
  • R studio is unable to produce visualizations.
  • R studio is built specifically for working with R.
  • The layout of panes in R studio is fixed.
  • R studio helps with file management.
  1. R users share custom solutions they have developed for data problems. Where can you find this information in RStudio?
  • Packages tab
  • History tab
  • Environment tab
  • R console
  1. What tool gives data analysts the highest level of control over their data analysis?
  • Spreadsheet
  • SQL
  • Tableau
  • Programming language
  1. Using a programming language can help you with which aspects of data analysis? Select all that apply.
  • Visualize your data
  • Ask the right questions about your data
  • Transform your data
  • Clean your data
  1. What is the term for programming code that is freely available and may be modified and shared by the people who use it?
  • Open-source
  • Open-ended
  • Data-centric
  • Open-data
  1. For what reasons do many data analysts choose to use R? Select all that apply.
  • R can quickly process lots of data.
  • R is a data-centric programming language.
  • R can create high quality visualizations.
  • R is a closed source programming language.
  1. What is a benefit of using the R programming language for data analysis? Select all that apply.
  • It is the most popular machine-learning language.
  • It is a general-purpose programming language.
  • It can create world-class visualizations.
  • It can work with large amounts of data
  1. RStudio’s integrated development environment lets you perform which of the following actions? Select all that apply.
  • Install R packages
  • Import data from spreadsheets
  • Create data visualizations
  • Stream online videos
  1. Fill in the blank: In RStudio, the _____ is where you can find all the data you currently have loaded, organize it, and save it.
  • source editor pane
  • environment pane
  • R console pane
  • plots pane
  1. Which of the following are benefits of open-source code? Select all that apply.
  • Anyone can pay a fee for access to the code.
  • Anyone can use the code for free.
  • Anyone can fix bugs in the code.
  • Anyone can create an add-on package for the code.
  1. A data analyst is searching for an open-source tool that will allow them to work with very large amounts of data. What tool is the best option?
  • Spreadsheet
  • JSON
  • R
  • Tableau
  1. In RStudio, where can you find and manage all the data you currently have loaded?
  • R console pane
  • Plots tab
  • Source editor pane
  • Environment pane
  1. What are the benefits of using a programming language for data analysis? Select all that apply.
  • Clarify the steps of the analysis
  • Easily reproduce and share the analysis
  • Automatically choose a topic for analysis
  • Efficiently save time
  1. What attribute of the R programming language makes it an open-source programming language?
  • The code is designed to be data-centric.
  • The code is open to processing large amounts of data.
  • The code is distributed by a company named “Open-Source.”
  • The code can be modified and shared by anyone who uses it.
  1. In which two parts of RStudio can you execute code? Select all that apply.
  • The environment pane
  • The source editor pane
  • The R console pane
  • The plots pane
  1. How do data analysts refer to the words and symbols they use to write instructions for computers?
  • Programming languages
  • Syntax languages
  • Code languages
  • Variable languages
  1. A data analyst wants to write R code in RStudio that will go away after they close their current session. Where should they write their code?
  • Environment tab
    • Source editor
    • Plots tab
  • R console

Week 2 – Programming using RStudio

A data analyst inputs the following code in RStudio: print(100 / 10) What type operators does the analyst use in the code?

  • Assignment
    • Conditional
    • Logical
  • Arithmetic

Which of the following is a best practice when naming variables in R?

  • Variable names should be verbs.
    • Variable names should start with special characters.
  • Use lowercase for variable names.
  • Use a space character to separate words in variable names.

1.A data analyst is assigning a variable to a value in their company’s sales dataset for 2020. Which variable name uses the correct syntax?

  • -sales-2020
  • 2020_sales
  • sales_2020
  • _2020sales
  1. You want to create a vector with the values 12, 23, 51, in that exact order. After specifying the variable, what R code chunk allows you to create the vector?
  • c(12, 23, 51)
  • v(12, 23, 51)
  • c(51, 23, 12)
  • v(51, 23, 12)
  1. An analyst runs code to convert string data into a date/time data type that results in the following: “2020-07-10”. Which of the following are examples of code that would lead to this return? Select all that apply.
  • mdy(“July 10th, 2020”)
  • ymd(20200710)
  • myd(2020, July 10)
  • dmy(“7-10-2020”)
  1. A data analyst inputs the following code in RStudio:

change_1 <- 70

Which of the following types of operators does the analyst use in the code?

  • Assignment
  • Logical
  • Relational
  • Arithmetic
  1. A data analyst is deciding on naming conventions for an analysis that they are beginning in R. Which of the following rules are widely accepted stylistic conventions that the analyst should use when naming variables? Select all that apply.
  • Use single letters, such as “x” to name all variables
  • Use an underscore to separate words within a variable name
  • Begin all variable names with an underscore
  • Use all lowercase letters in variable names
  1. In R, what includes reusable functions and documentation about how to use the functions?
  • Pipes
  • Comments
  • Packages
  • Vectors
  1. Packages installed in RStudio are called from CRAN. CRAN is an online archive with R packages and other R-related resources.
  • True
  • False
  1. A data analyst is reviewing some code and finds the following code chunk:

mtcars %>%

filter(carb > 1) %>%

group_by(cyl) %>%

What is this code chunk an example of?

  • Pipe
  • Nested function
  • Vector
  • Data frame

Shuffle Q/A

  1. A data analyst finds the code mdy(10211020) in an R script. What is the year of the date that is created?
  • 1021
  • 1020
  • 1102
  • 2120
  1. Which of the following is a best practice when naming R script files?
  • R script file names should end in “.R”
  • R script file names should end in “.S”
  • R script file names should end in “.rscript”
  • R script file names should end in “.r-script”
  1. How are base packages different from recommended packages in the R package ecosystem?
  • Recommended packages are made by the community and base packages are not.
  • Base packages take longer to load than recommended packages.
  • Base packages are installed and loaded by default and recommended packages are not.
  • Recommended packages are more professionally designed than base packages.
  1. Why would a data analyst want to use the CRAN network when working with RStudio?
  • To add new operators to R
  • To install R packages
  • To add pipes to R
  • To install drivers to RStudio
  1. A data analyst wants to take a data frame named people and filter the data where age is 10, arranged by height, and grouped by gender. Which code snippet would perform those operations in the specified order?
  • where age is equal to 10
  1. Which of the following are examples of variable names that can be used in R? Select all that apply.
  • autos_5
  • utility2
  • 3_sales
  • red1
  1. You want to create a vector with the values 43, 56, 12 in that exact order. After specifying the variable, what R code chunk lets you create the vector?
  • c(43, 56, 12)
  • v(12, 56, 43)
  • v(43, 56, 12)
  • c(12, 56, 43)
  1. An analyst comes across dates listed as strings in a dataset. For example, December 10th, 2020. To convert the strings to a date/time data type, which function should the analyst use?
  • lubridate()
  • datetime()
  • now()
  • mdy()
  1. A data analyst inputs the following code in RStudio: sales_1 <- (3500.00 * 12) Which of the following types of operators does the analyst use in the code? Select all that apply.
  • Relational
  • Logical
  • Arithmetic
  • Assignment
  1. Which of the following files in R have names that follow widely accepted naming convention rules? Select all that apply.
  • patient_details_1.R
  • title*123.R
  • p1+infoonpatients.R
  • patient_data.R
  1. Which of the following are included in R packages? Select all that apply.
  • Naming conventions for R variable names
  • Reusable R functions
  • Tests for checking your code
  • Sample datasets
  1. What is the name of the popular package archive dedicated to supporting R users authentic, validated code?
  • The CRAN archive
  • The RStudio website
  • The tidyverse
  • Python
  1. A data analyst writes the following code in a script and gets an error. What is wrong with their code?

penguins %>%

filter(flipper_length_mm == 200) %>%

group_by(species) %>%

summarize(mean = mean(body_mass_g)) %>%

  • They are using too many functions.
  • The last line should not have a pipe operator.
  • The first line should have a pipe operator before penguins.
  • They are using the wrong characters for the pipe operator.
  1. Fill in the blank: When creating a variable for use in R, your variable name should begin with _____.
  • an operator
  • a letter
  • an underscore
  • a number
  1. You want to create a vector with the values 21, 12, 39, in that exact order. After specifying the variable, what R code chunk lets you create the vector?
  • c(39, 12, 21)
  • v(39, 12, 21)
  • v(21, 12, 39)
  • c(21, 12, 39)
  1. If you use the mdy() function in R to convert the string “April 10, 2019”, what will return when you run your code?
  • “4.10.19”
  • “4/10/2019”
  • “2019-10-4”
  • “2019-4-10”
  1. A data analyst wants to combine values using mathematical operations. What type of operator would they use to do this?
  • Arithmetic
  • Conditional
  • Logical
  • Assignment
  1. Which of the following files in R have names that follow widely accepted naming convention rules? Select all that apply.
  • p1+infoonpatients.R
  • patient_data.R
  • patient_details_1.R
  • title*123.R
  1. A data analyst wants to create functions, documentation, sample data sets, and code test that they can share and reuse in other projects. What should they create to help them accomplish this?
  • A data frame
  • A tidyverse
  • A data type
  • A package
  1. A data analyst needs a system of packages that use a common design philosophy for data manipulation, exploration, and visualization. What set of packages fulfills their need?
  • Base
  • CRAN
  • tidyverse
  • Recommended
  1. Which of the following are examples of variable names that can be used in R? Select all that apply.
  • alpha_21
  • alpha21
  • tidyverse
  • Recommended
  1. What function is used to create vectors in the R programming language?
  • v()
  • c()
  • vector()
  • combine()
  1. What type of packages are automatically installed and loaded to use in R studio when you start your first programming session?
  • Recommended packages
  • Base packages
  • Community packages
  • CRAN packages
  1. Why would you want to use pipes instead of nested functions in R? Select all that apply.
  • Pipes make it easier to add or remove functions.
  • Pipes make it easier to read long sequences of functions.
  • Nested functions are no longer supported by R.
  • Pipes allow you to combine more functions in a single sequence.
  1. Which of the following are examples of variable names that can be used in R?
  • value(2)
  • value-2
  • value_2
  • value%2
  1. A data analyst has a dataset that contains date strings like "January 10th, 2022." What lubridate function can they use to convert these strings to dates?
  • myd()
  • mdy()
  • dmy()
  • ymd()
  1. What is the relationship between RStudio and CRAN?
  • RStudio and CRAN are both environments where data analysts can program using R code.
  • CRAN creates visualizations based on an analyst’s programming in RStudio.
  • CRAN contains all of the data that RStudio users need for analysis.
  • RStudio installs packages from CRAN that are not in Base R.
  1. A data analyst previously created a series of nested functions that carry out multiple operations on some data in R. The analyst wants to complete the same operations but make the code easier to understand for their stakeholders. Which of the following can the analyst use to accomplish this?
  • Pipe
  • Comment
  • Argument
  • Vector
  1. A data analyst wants to assign the value 50 to the variable daily_dosage. Which of the following types of operators will they need to use in the code?
  • Relational
  • Arithmetic
  • Assignment
  • Assignment
  1. A data analyst needs to find a package that offers a consistent set of functions that help them complete common data manipulation tasks like selecting and filtering. What tidyverse package provides this functionality?
  • tidyr
  • readr
  • ggplot2
  • dplyr
  1. When programming in R, what is a pipe used as an alternative for?
  • Nested function
  • Variable
  • Installed package
  • Vector

Week 3 – Working with data in R

A data scientist is trying to print a data frame but when you print the data frame to the console output produces too many rows and columns to be readable. What could they use instead of a data frame to make printing more readable?

  • A list
    • A structure
  • A tibble
  • A vector

A data analyst is working with a large data frame. It contains so many columns that they don’t all fit on the screen at once. The analyst wants a quick list of all of the column names to get a better idea of what is in their data. What function should they use?

  • colnames()
  • str()
  • mutate()
  • head()

You are working with the penguins dataset. You want to use the summarize() and min() functions to find the minimum value for the variable flipper_length_mm. At this point, the following code has already been written into your script:

penguins %>%

drop_na() %>%

group_by(species, sex) %>%

Add the code chunk that lets you find the minimum value for the variable flipper_length_mm.

(Note: do not type the above code into the code block editor, as it has already been inputted. Simply add a single line of code based on the prompt.)

What species and sex have the lowest minimum flipper length in mm?

  • Chinstrap males
  • Adelie females
  • Gentoo females
  • Gentoo males
  1. A data analyst is working with a dataset in R that has more than 50,000 observations. Why might they choose to use a tibble instead of the standard data frame? Select all that apply.
  • Tibbles can create row names
  • Tibbles automatically only preview the first 10 rows of data
  • Tibbles can automatically change the names of variables
  • Tibbles automatically only preview as many columns as fit on screen

2.A data analyst is exploring their data to get more familiar with it. They want a preview of just the first six rows to get a better idea of how the data frame is laid out. What function should they use?

  • print()
  • preview()
  • head()
  • colnames()
  1. You are working with the ToothGrowth dataset. You want to use the head() function to get a preview of the dataset. Write the code chunk that will give you this preview.

What are the names of the columns in the ToothGrowth dataset?

  • VC, supp, dose
  • len, supp, dose
  • len, supp, VC
  • len, VC, dose
  1. A data analyst is working with a data frame named sales. They write the following code:

sales %>%

The data frame contains a column named q1_sales. What code chunk does the analyst add to change the name of the column from q1_sales to quarter1_sales ?

  • rename(quarter1_sales = q1_sales)
  • rename(q1_sales <- “quarter1_sales”)
  • rename(quarter1_sales <- “q1_sales”)
  • rename(q1_sales = quarter1_sales)
  1. A data analyst is working with the penguins data. They write the following code:

penguins %>%

The variable species includes three penguin species: Adelie, Chinstrap, and Gentoo. What code chunk does the analyst add to create a data frame that only includes the Gentoo species?

  • filter(species == “Gentoo”)
  • filter(species <- “Gentoo”)
  • filter(Gentoo == species)
  • filter(species == “Adelie”)
  1. You are working with the penguins dataset. You want to use the summarize() and max() functions to find the maximum value for the variable flipper_length_mm. You write the following code:

penguins %>%

drop_na() %>%

group_by(species) %>%

Add the code chunk that lets you find the maximum value for the variable flipper_length_mm. drop_na() %>%

group_by(species) %>%

Add the code chunk that lets you find the minimum value for the variable bill_depth_mm.

What is the minimum bill depth in mm for the Chinstrap species?

What is the maximum flipper length in mm for the Gentoo species?

  • 200
  • 212
  • 210
  • 231
  1. A data analyst is working with a data frame called salary_data. They want to create a new column named total_wages that adds together data in the standard_wages and overtime_wages columns. What code chunk lets the analyst create the total_wages column?
  • mutate(salary_data, standard_wages = total_wages + overtime_wages)
  • mutate(salary_data, total_wages = standard_wages + overtime_wages)
  • mutate(salary_data, total_wages = standard_wages * overtime_wages)
  • mutate(total_wages = standard_wages + overtime_wages)
  1. A data analyst is working with a data frame named stores. It has separate columns for city (city) and state (state). The analyst wants to combine the two columns into a single column named location, with the city and state separated by a comma. What code chunk lets the analyst create the location column?
  • unite(stores, “location”, city, state, sep=”,”)
  • unite(stores, “location”, city, sep=”,”)
  • unite(stores, city, state, sep=”,”)
  • unite(stores, “location”, city, state)
  1. A data analyst writes the following code chunk to return a statistical summary of their dataset: quartet %>% group_by(set) %>% summarize(mean(x), sd(x), mean(y), sd(y), cor(x, y))

Which function will return the average value of the y column?

  • mean(y)
  • mean(x)
  • cor(x, y)
  • sd(x)
  1. A data analyst uses the bias() function to compare the actual outcome with the predicted outcome to determine if the model is biased. They get a score of 0.8. What does this mean?
  • Bias cannot be determined
  • The model is biased
  • Bias can be determined
  • The model is not biased

Shuffle Q/A

  1. What is an advantage of using data frames instead of tibbles?
  • Data frames allow you to create row names
  • Data frames make printing easier
  • Data frames allow you to use column names
  • Data frames store never change variable names
  1. A data analyst is examining a new dataset for the first time. They load the dataset into a data frame to learn more about it. What function(s) will allow them to review the names of all of the columns in the data frame? Select all that apply.
  • colnames()
  • head()
  • str()
  • library()
  1. You are working with the ToothGrowth dataset. You want to use the skim_without_charts() function to get a comprehensive view of the dataset. Write the code chunk that will give you this view.

What is the average value of the len column?

  • 18.8
  • 13.1
  • 4.2
  • 7.65
  1. A data analyst is working with a data frame named cars.The analyst notices that all the column names in the data frame are capitalized. What code chunk lets the analyst change all the column names to lowercase?
  • rename_with(tolower, cars)
  • rename_with(cars, toupper)
  • rename_with(toupper, cars)
  • rename_with(cars, tolower)
  1. A data analyst is working with the penguins dataset and wants to sort the penguins by body_mass_g from least to greatest. When they run the following code the penguin body mass data is not displayed in the correct order.

penguins %>% arrange(body_mass_g)

head(penguins)

What can the data analyst do to fix their code?

  • Save the results of arrange() to a variable that gets passed to head()
  • Add a minus sign in front of body_mass_g to reverse the order
  • Correct the capitalization of arrange() to Arrange()
  • Use the print() function instead of the head() function
  1. You are working with the penguins dataset. You want to use the summarize() and mean() functions to find the mean value for the variable body_mass_g. You write the following code:

penguins %>%

drop_na() %>%

group_by(species) %>%

Add the code chunk that lets you find the mean value for the variable body_mass_g.

What is the mean body mass in g for the Adelie species?

  • 3733.088
  • 5092.437
  • 3706.164
  • 4207.433
  1. A data analyst is working with a data frame called zoo_records. They want to create a new column named is_large_animal that signifies if an animal has a weight of more than 199 kilograms. What code chunk lets the analyst create the is_large_animal column?
  • zoo_records %>% mutate(is_large_animal = weight > 199)
  • zoo_records %>% mutate(weight > 199 = is_large_animal)
  • zoo_records %>% mutate(is_large_animal == weight > 199)
  • zoo_records %>% mutate(weight > 199 <- is_large_animal)
  1. A data analyst is working with a data frame named users. It has separate columns for first name (first_name) and last name (last_name). The analyst wants to combine the two columns into a single column called full_name, with the first name and last name separated by a space. What code chunk lets the analyst create the full_namecolumn?
  • unite(users, first_name, last_name, “full_name”, sep = ” “)
  • unite(users, “full_name”, first_name, last_name, sep = ” “)
  • merge(users, “full_name”, first_name, last_name, sep = ” “)
  • unite(users, “full_name”, first_name, last_name, sep = “, “)
  1. A data analyst is using statistical measures to get a better understanding of their data. What function can they use to determine how strongly related are two of the variables?
  • mean()
  • bias()
  • sd()
  • cor()
  1. A data analyst wants to find out how much the predicted outcome and the actual outcome of their data model differ. What function can they use to quickly measure this?
  • mean()
  • bias()
  • cor()
  • sd()
  1. A data analyst creates a data frame with data that has more than 50,000 observations in it. When they print their data frame, it slows down their console. To avoid this, they decide to switch to a tibble. Why would a tibble be more useful in this situation?
  • Tibbles won’t overload the console because they automatically only print the first 10 rows of data and as many variables as will fit on the screen
  • Tibbles will automatically change the names of variables to make them shorter and easier to read
  • Tibbles only include a limited number of data items
  • Tibbles will automatically create row names to make the data easier to read
  1. A data analyst wants to learn more about a specific data frame. Which function will allow them to review the data types of each column in the data frame?
  • package()
  • colnames()
  • library()
  • str()
  1. You have a data frame named employees with a column named Last_NAME. What will the name of the employees column be in the results of the function rename_with(employees, tolower)?
  • last_name
  • last_nAME
  • lAST_nAME
  • Last_NAME
  1. You are working with the penguins dataset. You want to use the summarize() and min() functions to find the minimum value for the variable bill_depth_mm. You write the following code:

penguins %>%

drop_na() %>%

group_by(species) %>%

Add the code chunk that lets you find the minimum value for the variable bill_depth_mm.

What is the minimum bill depth in mm for the Chinstrap species?

  • 16.4
  • 13.1
  • 15.5
  • 12.4
  1. A data analyst is working with a data frame called salary_data. They want to create a new column named hourly_salary that includes data from the wages column divided by 40. What code chunk lets the analyst create the hourly_salarycolumn?
  • mutate(salary_data, hourly_salary = wages / 40)
  • mutate(salary_data, hourly_salary = wages * 40)
  • mutate(hourly_salary = wages / 40)
  • mutate(hourly_salary, salary_data = wages / 40)
  1. In R, which statistical measure demonstrates how strong the relationship is between two variables?
  • Correlation
  • Maximum
  • Standard deviation
  • Average
  1. A data analyst creates two different predictive models for the same dataset. They use the bias() function on both models. The first model has a bias of -40. The second model has a bias of 1. Which model is less biased?
  • The second model
  • It can’t be determined from this information
  • The first model
  1. What scenarios would prevent you from being able to use a tibble?
  • You need to create column names
  • You need to store numerical data
  • You need to create row names
  • You need to change the data types of inputs
  1. A data analyst is working with a data frame named salary_data. They want to create a new column named wagesthat includes data from the rate column multiplied by 40. What code chunk lets the analyst create the wages column?
  • mutate(salary_data, wages = rate * 40)
  • mutate(salary_data, wages = rate + 40)
  • mutate(wages = rate * 40)
  • mutate(salary_data, rate = wages * 40)
  1. A data analyst wants to check the average difference between the actual and predicted values of a model. What single function can they use to calculate this statistic?
  • bias()
  • cor()
  • sd()
  • mean()
  1. A data analyst is considering using tibbles instead of basic data frames. What are some of the limitations of tibbles? Select all that apply.
  • Tibbles can overload a console
  • Tibbles can never change the input type of the data
  • Tibbles won’t automatically change the names of variables
  • Tibbles won’t automatically change the names of variables
  1. A data analyst wants a high level summary of the structure of their data frame, including the column names, the number of rows and variables, and type of data within a given column. What function should they use?
  • colnames()
  • head()
  • rename_with()
  • str()
  1. You are working with the ToothGrowth dataset. You want to use the glimpse() function to get a quick summary of the dataset. Write the code chunk that will give you this summary.

How many variables does the ToothGrowth dataset contain?

  • 5
  • 4
  • 2
  • 3
  1. A data analyst is working with the penguins dataset in R. What code chunk will allow them to sort the penguins data by the variable bill_length_mm?
  • arrange(penguins, bill_length_mm)
  • arrange(bill_length_mm, penguins)
  • arrange(=bill_length_mm)
  • arrange(=bill_length_mm)
  1. A data analyst is working with a data frame called sales. In the data frame, a column named location represents data in the format “city, state”. The analyst wants to split the city into an individual city column and state into a new countrycolumn. What code chunk lets the analyst split the location column?
  • separate(sales, location, into=c(“country”, “city” ), sep=”, “)
  • separate(sales, location, into=c(“city”, “country”), sep=”, “)
  • untie(sales, location, into=c(“city”, “country”), sep=”, “)
  • separate(sales, location, into=c(“country”, “city” ), sep=” “)
  1. A data analyst is working with the penguins data. The variable species includes three penguin species: Adelie, Chinstrap, and Gentoo. The analyst wants to create a data frame that only includes the Adelie species. The analyst receives an error message when they run the following code:

penguins %>%

filter(species <- “Adelie”)

How can the analyst change the second line of code to correct the error?

  • filter(Adelie == species)
  • filter(“Adelie”)
  • filter(“Adelie” <- species)
  • filter(species == “Adelie”)
  1. You are working with the penguins dataset and want to understand the year of data collection for all combinations of species, island, and sex. You write the following code:

penguins %>%

drop_na() %>%

group_by(species) %>%

summarize(min = min(year), max = max(year))

When you run the code in the code box, how many different groups are returned by this code chunk?

  • 3
  • 10
  • 2
  • 6
  1. You are working with the ToothGrowth dataset. You want to use the glimpse() function to get a quick summary of the dataset. Write the code chunk that will give you this summary.

How many different data types are used for the column data types?

  • 2
  • 3
  • 60
  • 1
  1. A data analyst is working with a data frame named customers. It has separate columns for area code (area_code) and phone number (phone_num). The analyst wants to combine the two columns into a single column called phone_number, with the area code and phone number separated by a hyphen. What code chunk lets the analyst create the phone_numbercolumn?
  • unite(customers, “phone_number”, area_code, sep=”-”)
  • unite(customers, “phone_number”, area_code, phone_num, sep=”-”)
  • unite(customers, “phone_number”, area_code, phone_num)
  • unite(customers, area_code, phone_num, sep=”-”)
  1. You are compiling an analysis of the average monthly costs for your company. What summary statistic function should you use to calculate the average?
  • mean()
  • max()
  • cor()
  • min()
  1. A data analyst is studying weather data. They write the following code chunk:

bias(actual_temp, predicted_temp)

What will this code chunk calculate?

  • The average difference between the actual and predicted values
  • The maximum difference between the actual and predicted values
  • The total average of the values
  • The minimum difference between the actual and predicted values

Week 4 – More about visualizations, aesthetics, and annotations

A data analyst creates a scatterplot with many data points. The analyst wants to make some points on the plot more transparent than others. What aesthetic should the analyst use?

  • Alpha
  • Fill
  • Color
  • Shape

You are working with the diamonds dataset. You create a bar chart with the following code:

ggplot(data = diamonds) +

geom_bar(mapping = aes(x = color, fill = cut)) +

You want to use the facet_wrap() function to display subsets of your data. Add the code chunk that lets you facet your plot based on the variable cut.

How many subplots does your visualization show?

  • 6
    • 4
    • 3
  • 5
  1. Which of the following are benefits of using ggplot2? Select all that apply.
  • Customize the look and feel of your plot
  • Easily add layers to your plot
  • Combine data manipulation and visualization
  • Automatically clean data before creating a plot
  1. A data analyst creates a bar chart with the diamonds dataset. They begin with the following line of code:

ggplot(data = diamonds)

What symbol should the analyst put at the end of the line of code to add a layer to the plot?

  • pipe operator (%>%)
  • plus sign (+)
  • equal sign (=)
  • ampersand symbol (&)
  1. A data analyst creates a plot using the following code chunk:

ggplot(data = penguins) + geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))

Which of the following represents a function in the code chunk? Select all that apply.

  • The aes function
  • The geom_point function
  • the data function
  • The ggplot function
  1. Fill in the blank: In ggplot2, the term mapping refers to the connection between variables and _____ .
  • data frames
  • geoms
  • facets
  • aesthetics
  1. A data analyst creates a scatterplot with a lot of data points. The analyst wants to make some points on the plot more transparent than others. What aesthetic should the analyst use?
  • Color
  • Shape
  • Alpha
  • Fill
  1. You are working with the penguins dataset. You create a scatterplot with the following code:

ggplot(data = penguins) +

geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))

You want to highlight the different penguin species on your plot. Add a code chunk to the second line of code to map the aesthetic shape to the variable species.

NOTE: the three dots (...) indicate where to add the code chunk.

Which penguin species does your visualization display?

  • Adelie, Chinstrap, Gentoo
  • Emperor, Chinstrap, Gentoo
  • Adelie, Chinstrap, Emperor
  • Adelie, Gentoo, Macaroni
  1. A data analyst creates a plot with the following code chunk:

ggplot(data = penguins) +

geom_jitter(mapping = aes(x = flipper_length_mm, y = body_mass_g))

What does the geom_jitter() function do to the points in the plot?

  • Adds a small amount of random shapes at each point in the plot
  • Decrease the size of each point in the plot
  • Adds a small amount of random noise to each point in the plot
  • Adds random colors to each point in the plot
  1. You are working with the diamonds dataset. You create a bar chart with the following code:

ggplot(data = diamonds) +

geom_bar(mapping = aes(x = color, fill = cut)) +

You want to use the facet_wrap() function to display subsets of your data. Add the code chunk that lets you facet your plot based on the variable clarity.

How many subplots does your visualization show?

  • 9
  • 6
  • 8
  • 7
  1. Fill in the blank: You can use the _____ function to put a text label on your plot to call out specific data points.
  • annotate()
  • ggplot()
  • facet_grid()
  • geom_smooth()
  1. You are working with the penguins dataset. You create a scatterplot with the following lines of code:

ggplot(data = penguins) +

geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g)) +

What code chunk do you add to the third line to save your plot as a png file with “penguins” as the file name?

  • ggsave(“penguins”)
  • ggsave(penguins.png)
  • ggsave(“png.penguins”)
  • ggsave(“penguins.png”)

Shuffle Q/A

  1. In ggplot2, what symbol do you use to add layers to your plot?
  • The pipe operator (%>%)
  • The plus sign (+)
  • The ampersand symbol (&)
  • The equals sign (=)
  1. A data analyst creates a plot using the following code chunk:

ggplot(data = buildings) +

geom_bar(mapping = aes(x = construction_year, color = height))

Which of the following represents an aesthetic attribute in the code chunk?

  • ggplot
  • construction_year
  • buildings
  • x
  1. Which code snippet will make all of the bars in the plot have different colors based on their heights?
  • ggplot(data = buildings) + geom_bar(mapping = aes(x = construction_year), color=height)
  • ggplot(data = buildings) + geom_bar(mapping = aes(x = construction_year)) + color(“height”)
  • ggplot(data = buildings) + geom_bar(mapping = aes(x = construction_year, color=height))
  • ggplot(data = buildings) + geom_bar(mapping = aes(x = construction_year)) + color(height)
  1. What is the purpose of the facet_wrap() function?
  • Modify the visual characteristic of a data point
  • Modify ggplot visuals to be three-dimensional
  • Create text inside a plot area
  • Create subplots in a grid of two variables
  1. A data analyst uses the annotate() function to create a text label for a plot. Which attributes of the text can the analyst change by adding code to the argument of the annotate() function? Select all that apply.
  • Change the font style of the text.
  • Change the color of the text.
  • Change the size of the text.
  • Change the text into a title for the plot.
  1. Which statement about the ggsave() function is correct?
  • ggsave() exports the last plot displayed by default.
  • ggsave() is run from the Plots Tab in RStudio.
  • ggsave() is the only way to export a plot.
  • ggsave() is unable to save .png files.
  1. Which of the following statements about ggplot is true?
  • ggplot allows analysts to create plots using a single function.
  • ggplot is the default plotting package in base R.
  • ggplot allows analysts to create different types of plots.
  • ggplot is designed to make cleaning data easy.
  1. A data analyst creates a plot using the following code chunk:

ggplot(data = buildings) +

geom_bar(mapping = aes(x = construction_year, color = height))

Which of the following represents a variable in the code chunk?

  • construction_year
  • mapping
  • data
  • ggplot
  1. Which code snippet will make all of the bars in the plot purple?
  • ggplot(data = buildings) + geom_bar(mapping = aes(x = construction_year, color=”purple”))
  • ggplot(data = buildings) + geom_bar(mapping = aes(x = construction_year)) + color(“purple”)
  • ggplot(data = buildings) + geom_bar(mapping = aes(x = construction_year, color=height))
  • ggplot(data = buildings) + geom_bar(mapping = aes(x = construction_year), color=”purple”)
  1. A data analyst is working with the following plot and gets an error caused by a bug. What is the cause of the bug?

ggplot(data = penguins) %>%

geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))

  • The code uses a pipe instead of a plus sign.
  • A missing closing parenthesis needs to be added.
  • The pipe should be at the beginning of the second line.
  • A function name needs to be capitalized.
  1. You are working with the penguins dataset. You create a scatterplot with the following code chunk:

ggplot(data = penguins) +

geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))

You want to highlight the different penguin species in your plot. Add a code chunk to the second line of code to map the aesthetic size to the variable bill_depth_mm.

NOTE: the three dots (...) indicate where to add the code chunk. You may need to scroll in order to find the dots.

Which approximate range of bill depths does your visualization display?

  • 2 – 9
  • 31 – 40
  • 20 – 31
  • 14 – 20
  1. A data analyst has a scatter plot with crowded points that make it hard to identify a trend. What geometry function can they add to their plot to clearly indicate the trend of the data?
  • geom_alpha()
  • geom_bar()
  • geom_jitter()
  • geom_smooth()
  1. A data analyst wants to add a large piece of text above the grid area that clearly defines the purpose of a plot. Which ggplot function can they use to achieve this?
  • subtitle()
  • title()
  • labs()
  • annotate()
  1. By default, what plot does the ggsave() function export?
  • The plot define the plots.config file
  • The last displayed plot
  • The plot defined in the Plots Tab of R Studio
  • The first plot displayed
  1. Which of the following tasks can you complete with ggplot2 features? Select all that apply.
  • Customize the visual features of a plot
  • Automatically clean data before creating a plot
  • Add labels and annotations to a plot
  • Create many different types of plots
  1. A data analyst is working with the following plot and gets an error caused by a bug. What is the cause of the bug?

ggplot(data = penguins)

+ geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))

  • The plus should be at the end of the first line.
  • A missing closing parenthesis needs to be added.
  • The code uses a plus sign instead of a pipe.
  • A function name needs to be capitalized.
  1. You are working with the penguins dataset. You create a scatterplot with the following code chunk:

ggplot(data = penguins) +

geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))

You want to highlight the different penguin species in your plot. Add a code chunk to the second line of code to map the aesthetic shape to the variable species.

NOTE: the three dots (...) indicate where to add the code chunk. You may need to scroll in order to find the dots.

Which species tends to have the longest flipper length and highest body mass?

  • Gentoo
  • Macaroni
  • Adelie
  • Chinstrap
  1. A data analyst creates a scatterplot where the points are very crowded, which makes it hard to notice when points are stacked. What change can they make to their scatter plot to make it easier to notice the stacked data points?
  • Change geom_point() to geom_jitter()
  • Change ggplot() to ggplot2()
  • Change the color of the points
  • Change the shape of the points
  1. Which code snippet will make all of the bars in the plot have different colors and shapes based on their heights?
  • ggplot(data = buildings) + geom_bar(mapping = aes(x = construction_year, color=[height, height]))
  • ggplot(data = buildings) + geom_bar(mapping = aes(x = construction_year, color=height, shape=height))
  • ggplot(data = buildings) + geom_bar(mapping = aes(x = construction_year, color=height), aes(shape=height))
  • ggplot(data = buildings) + geom_bar(mapping = aes(x = construction_year)) + color(height) + shape(height)
  1. You are working with the penguins dataset. You create a scatterplot with the following code:

ggplot(data = penguins) +

geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))

You want to highlight the different years of data collection on your plot. Add a code chunk to the second line of code to map the aesthetic size to the variable year.

NOTE: the three dots (...) indicate where to add the code chunk. You may need to scroll in order to find the dots.

What years does your visualization display?

  • 2006-2010
  • 2005-2009
  • 2007-2009
  • 2007-2011
  1. Fill in the blank: The _____ creates a scatterplot and then adds a small amount of random noise to each point in the plot to make the points easier to find.
  • geom_jitter() function
  • geom_point() function
  • geom_bar() function
  • geom_smooth() function
  1. A data analyst creates a plot using the following code chunk:

ggplot(data = buildings) +

geom_bar(mapping = aes(x = construction_year, color = height))

Which of the following represents a function in the code chunk?

  • The height function
  • The x function
  • The ggplot function
  • The mapping function
  1. A data analyst is working with the following plot and gets an error caused by a bug. What is the cause of the bug?

ggplot(data = penguins) +

geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g)

  • A missing closing parenthesis needs to be added.
  • The plus sign should be at the beginning of the second line.
  • The code uses a plus sign instead of a pipe.
  • A function name needs to be capitalized.
  1. Which of the following statements best describes a facet in ggplot?
  • Facets are the ggplot terminology for a chart axis.
  • Facets are subplots that display data for each value of a variable.
  • Facets are the visual characteristics of geometry objects.
  • Facets are the text used in and around plots.
  1. Which of the following is a functionality of ggplot2?
  • Combine data manipulation and visualizations using pipes.
  • Filter and sort data in complex ways.
  • Define complex visualization using a single function.
  • Create plots using artificial intelligence.
  1. Which ggplot function is used to define the mappings of variables to visual representations of data?
  • annotate()
  • mapping()
  • aes()
  • ggplot()
  1. You are working with the penguins dataset. You create a scatterplot with the following code chunk:

ggplot(data = penguins) +

geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g))

You want to highlight the different years of data collection on your plot. Add a code chunk to the second line of code to map the aesthetic alpha to the variable island.

NOTE: the three dots (...) indicate where to add the code chunk. You may need to scroll in order to find the dots.

What islands does your visualization display?

  • Biscoe, Dream, Torgersen
  • Cebu, Borneo, Torgersen
  • Cebu, Java, Hispaniola
  • Biscoe, Java, Buton
  1. What function creates a scatterplot and then adds a small amount of random noise to each point in the plot to make the points easier to find?
  • The geom_smooth() function
  • The geom_jitter() function
  • The geom_point() function
  • The geom_bar() function
  1. A data analyst wants to add text elements inside the grid area of their plot. Which ggplot function allows them to do this?
  • annotate()
  • labs()
  • facet()
  • text()
  1. You are working with the penguins dataset. You create a scatterplot with the following lines of code:

ggplot(data = penguins) +

geom_point(mapping = aes(x = flipper_length_mm, y = body_mass_g)) +

What code chunk do you add to the third line to save your plot as a pdf file with “penguins” as the file name?

  • ggsave(penguins.pdf)
  • ggsave(“pdf.penguins”)
  • ggsave(=penguins)
  • ggsave(“penguins.pdf”)

Week 5 – Documentation and reports

A data analyst wants to create a shareable report of their analysis with documentation of their process and notes explaining their code to stakeholders. What tool can they use to generate this?

  • R Markdown
  • Filters
  • Code chunks
  • Dashboards

A data analyst wants to add a bulleted list to their R Markdown document. What symbol can they type to create this formatting?

  • Delimiters
    • Hashtags
    • Brackets
  • Asterisks
  1. A data analyst wants to create documentation for their cleaning process so other analysts on their team can recreate this process. What tool can help them create this shareable report?
  • Code chunks
  • Inline code
  • Dashboards
  • R Markdown
  1. A data analyst wants to export their R Markdown notebook as a text document. What are the text document formats they can use to share their R Markdown notebook? Select all that apply.
  • Notepad
  • Word
  • PDF
  • HTML
  1. A data analyst writes two hashtags next to their header. What will this do to the header font in the .rmd file?
  • Make it bigger
  • Make it smaller
  • Make it centered
  • Make it a different color
  1. Fill in the blank: A data analyst includes _____ in their R Markdown notebook so that they can refer to it directly in their explanation of their analysis.
  • inline code
  • markdown
  • markdown
  • documentation
  1. What symbol can be used to add bullet points in R Markdown?
  • Asterisks
  • Brackets
  • Exclamation marks
  • Backticks
  1. A data analyst adds a section of executable code to their .rmd file so users can execute it and generate the correct output. What is this section of code called?
  • Data plot
  • Documentation
  • YAML
  • Code chunk
  1. A data analyst is inserting a line of code directly into their .rmd file. What will they use to mark the beginning and end of the code?
  • Delimiters
  • Asterisks
  • Markdown
  • Hashtags
  1. A data analyst who works with R creates a weekly sales report by remaking their .rmd file and converting it to a report. What can they do to streamline this process?
  • Create an R notebook
  • Knit their .rmd file
  • Convert their .rmd file
  • Create a template

Shuffle Q/A

  1. R Markdown is a file format for making dynamic documents with R. What are the benefits of creating this kind of document? Select all that apply.
  • Save, organize, and document code
  • Create a record of your cleaning process
  • Perform calculations for analysis more efficiently
  • Generate a report with executable code chunks
  1. A data analyst wants to change their header to be one font size smaller. What should they add to their markdown syntax?
  • Backtick
  • Exclamation mark
  • Double space
  • Hashtag
  1. A data analyst wants to include a line of code directly in their .rmd file in order to explain their process more clearly. What is this code called?
  • YAML
  • Markdown
  • Documented
  • Inline code
  1. Which sample correctly implements a code chunk in a .rmd file?
  • value <- 8
  • “`{r} value <- 8 “`
  • ### value <- 8
  • “`{!}value <- 8 “`
  1. What type of export document should you use while you are working and don’t need to worry about adding page breaks in the correct places?
  • HTML
  • YAML
  • PDF
  • Word
  1. What are the benefits of working with R Markdown? Select all that apply.
  • R Markdown runs interactive code chunks.
  • R Markdown runs R code faster.
  • R Markdown makes it possible to use larger datasets.
  • R Markdown allows styled text between code. (=)
  1. A data analyst wants to change the default file format that gets exported by the Knit button to .pdf. What field of the YAML header should they change to set the new default file format?
  • export
  • title
  • output
  • author
  1. A data analyst is reading through an R Markdown notebook and finds the text this is important. What is the purpose of the underscore characters in this text?
  • They add the text as an image caption
  • They wrap the text in a clickable link
  • They style the text as bold
  • They style the text as italics
  1. A data analyst works with an .rmd file in RStudio and wants the ability to quickly find a code chunk using the label “analysis”. Which code example would allow the analyst to quickly access the code chunk using this label?
  • “`{analysis r}
  • “`analysis{r}
  • “`{r analysis}
  • “`{r} analysis
  1. Fill in the blank: A delimiter is a character that indicates the beginning or end of _____.
  • a data item
  • an analysis
  • a section
  • a header
  1. Fill in the blank: R Markdown notebooks can be converted into HTML, PDF, and Word documents, slide presentations, and _____.
  • dashboards
  • tables
  • YAML
  • spreadsheets
  1. Which code snippet implements the correct syntax for writing a piece of hyperlinked text in markdown?
  1. What is the purpose of the Knit button in R Studio?
  • It combines multiple .rmd files into a single file.
  • It imports the content from a .rmd file.
  • It creates a new .rmd file.
  • It exports the .rmd file to another document type.
  1. A data analyst wants to make a word in their markdown stand out by making it bold. What characters should they surround the text with to achieve the bold style?
  • Angle brackets (<>)
  • Double asterisks (**)
  • Double hashtag (##)
  • Single asterisk (*)
  1. A data analyst is working in a .rmd file and comes across the text ```{r analysis}. What is the purpose of the text “analysis”?
  • It is a label for the code chunk
  • It changes the way the code gets exported
  • It runs the code in analysis mode
  • It alters the output file format of Knit
  1. Why would a data analyst create a template of their .rmd file? Select all that apply.
  • To create an interactive notebook
  • To prevent other users from editing the file
  • To customize the appearance of a final report
  • To save time when creating the same kind of document
  1. A data analyst wants to perform an analysis and make it easy for colleagues to understand his process and update the analysis a year from now. Which tool is best to achieve this objective?
  • Code chunks
  • R Markdown
  • PDF document
  • Word Document
  1. A data analyst needs to create a shareable report in RStudio. They first want to change the default file format that gets exported by the Knit button to .pdf. What value should they use for the output field in the YAML header?
  • pdf_knit
  • pdf_document
  • document_pdf
  • knit_pdf
  1. What does the ```{r} delimiter (three backticks followed by an r contained inside curly brackets) indicate in an R Markdown notebook?
  • The start of YAML metadata
  • The end of a code chunk
  • The end of YAML metadata
  • The start of a code chunk
  1. A data analyst notices that their header is much smaller than they wanted it to be. What happened?
  • They have too few asterisks
  • They have too many hashtags
  • They have too few hashtags
  • They have too many asterisks
  1. Fill in the blank: _____ code is code that can be inserted directly into a .rmd file.
  • Executable
  • Markdown
  • Inline
  • YAML
  1. Fill in the blank: If an analyst creates the same kind of document over and over or customizes the appearance of a final report, they can use _____ to save them time.
  • a filter
  • a code chunk
  • an .rmd file
  • a template
  1. Which combination of text characters can be used to embed an image in a markdown document?
  • ![]()
  • ##
  • <>
  • *[]()
  1. When you Knit a file in RStudio what part of code chunks are shown by default?
  • The delimiter
  • The output
  • The YAML
  • The code
  1. A data analyst comes across in a piece of markdown text. What effect do the angle brackets (<>) have on the inner text?
  • They create a piece of inline code
  • They create a clickable link
  • They create a bullet list
  • They create bold text

Course challenge

After previewing and cleaning your data, you determine what variables are most relevant to your analysis. Your main focus is on Rating, Cocoa.Percent, and Bean.Type. You decide to use the select() function to create a new data frame with only these three variables.

Assume the first part of your code is:

trimmed_flavors_df <- flavors_df %>%

Add the code chunk that lets you select the three variables.

What bean type appears in row 6 of your tibble?

  • Beniano
    • Forastero
  • Criollo
  • Trinitario
  1. Scenario 1, questions 1-7

As part of the data science team at Gourmet Analytics, you use data analytics to advise companies in the food industry. You clean, organize, and visualize data to arrive at insights that will benefit your clients. As a member of a collaborative team, sharing your analysis with others is an important part of your job.

Your current client is Chocolate and Tea, an up-and-coming chain of cafes.

The eatery combines an extensive menu of fine teas with chocolate bars from around the world. Their diverse selection includes everything from plantain milk chocolate, to tangerine white chocolate, to dark chocolate with pistachio and fig. The encyclopedic list of chocolate bars is the basis of Chocolate and Tea’s brand appeal. Chocolate bar sales are the main driver of revenue.

Chocolate and Tea aims to serve chocolate bars that are highly rated by professional critics. They also continually adjust the menu to make sure it reflects the global diversity of chocolate production. The management team regularly updates the chocolate bar list in order to align with the latest ratings and to ensure that the list contains bars from a variety of countries.

They’ve asked you to collect and analyze data on the latest chocolate ratings. In particular, they’d like to know which countries produce the highest-rated bars of super dark chocolate (a high percentage of cocoa). This data will help them create their next chocolate bar menu.

Your team has received a dataset that features the latest ratings for thousands of chocolates from around the world. Click here to access the dataset. Given the data and the nature of the work you will do for your client, your team agrees to use R for this project.

You create a short document about the benefits of using R for the project and share the document with your team. You write that the benefits include R’s ability to quickly process lots of data and easily reproduce and share an analysis. What is another benefit of using R for the project?

  • Automatically clean data
  • Define a problem and ask the right questions
  • Create high-quality visualizations
  • Choose a topic for analysis
  1. Scenario 1, continued

Before you begin working with your data, you need to import it and save it as a data frame. To get started, you open your RStudio workspace and load all the necessary libraries and packages. You upload a .csv file containing the data to RStudio and store it in a project folder named flavors_of_cacao.csv.

You use the read_csv() function to import the data from the .csv file. Assume that the name of the data frame is flavors_df and the .csv file is in the working directory. What code chunk lets you create the data frame?

  • read_csv(“flavors_of_cacao.csv”) <- flavors_df
  • flavors_df <- read_csv(“flavors_of_cacao.csv”)
  • flavors_df + read_csv(“flavors_of_cacao.csv”)
  • read_csv(flavors_df <- “flavors_of_cacao.csv”)
  1. Scenario 1, continued

Now that you’ve created a data frame, you want to find out more about how the data is organized. The data frame has hundreds of rows and lots of columns.

Assume the name of your data frame is flavors_df. What code chunk lets you review the column names in the data frame?

  • col(flavors_df)
  • rename(flavors_df)
  • colnames(flavors_df)
  • arrange(flavors_df)
  1. Scenario 1, continued

Next, you begin to clean your data. When you check out the column headings in your data frame you notice that the first column is named Company...Maker.if.known. (Note: The period after known is part of the variable name.) For the sake of clarity and consistency, you decide to rename this column Maker (without a period at the end).

Assume the first part of your code chunk is:

flavors_df %>%

What code chunk do you add to change the column name?

  • rename(Maker %<% Company…Maker.if.known.)
  • rename(Company…Maker.if.known %<% Maker)
  • rename(Maker = Company…Maker.if.known.)
  • rename(Company…Maker.if.known. = Maker)
  1. After previewing and cleaning your data, you determine what variables are most relevant to your analysis. Your main focus is on Rating, Cocoa.Percent, and Company. You decide to use the select() function to create a new data frame with only these three variables.

Assume the first part of your code is:

trimmed_flavors_df <- flavors_df %>%

Add the code chunk that lets you select the three variables.

  • Videri
  • A. Morin
  • Soma
  • Rogue
  1. Next, you select the basic statistics that can help your team better understand the ratings system in your data.

Assume the first part of your code is:

trimmed_flavors_df %>%

You want to use the summarize() and max() functions to find the maximum rating for your data. Add the code chunk that lets you find the maximum value for the variable Rating.

What is the maximum rating?

  • 4.5
  • 5
  • 6
  • 5.5

7.After completing your analysis of the rating system, you determine that any rating greater than or equal to 3.5 points can be considered a high rating. You also know that Chocolate and Tea considers a bar to be super dark chocolate if the bar's cocoa percent is greater than or equal to 70%. You decide to create a new data frame to find out which chocolate bars meet these two conditions.

Assume the first part of your code is:

best_trimmed_flavors_df <- trimmed_flavors_df %>%

You want to apply the filter() function to the variables Cocoa.Percent and Rating. Add the code chunk that lets you filter the data frame for chocolate bars that contain at least 70% cocoa and have a rating of at least 3.5 points.

  • 4.00
  • 4.25
  • 3.75
  • 3.50
  1. Now that you’ve cleaned and organized your data, you’re ready to create some useful data visualizations. Your team assigns you the task of creating a series of visualizations based on requests from the Chocolate and Tea management team. You decide to use ggplot2 to create your visuals.

Assume your first line of code is:

ggplot(data = best_trimmed_flavors_df) +

You want to use the geom_bar() function to create a bar chart. Add the code chunk that lets you create a bar chart with the variable Rating on the x-axis.

  • 2
  • 5
  • 6
  • 3
  1. Your bar chart reveals the locations that produce the highest rated chocolate bars. To get a better idea of the specific rating for each location, you’d like to highlight each bar.

Assume that you are working with the following code:

ggplot(data = best_trimmed_flavors_df) +

geom_bar(mapping = aes(x = Company.Location))

Add a code chunk to the second line of code to map the aesthetic fill to the variable Rating.

NOTE: the three dots (...) indicate where to add the code chunk.

According to your bar chart, which two company locations produce the highest rated chocolate bars?

  • Canada and France
  • Scotland and U.S.A
  • Scotland and Canada
  • Amsterdam and France
  1. Scenario 2, continued

A teammate creates a new plot based on the chocolate bar data. The teammate asks you to make some revisions to their code.

Assume your teammate shares the following code chunk:

ggplot(data = best_trimmed_flavors_df) +

geom_bar(mapping = aes(x = Cocoa.Percent)) +

What code chunk do you add to the third line to create wrap around facets of the variable Cocoa.Percent?

  • facet_wrap(Cocoa.Percent~)
  • facet_wrap(~Cocoa.Percent)
  • facet(=Cocoa.Percent)
  • facet_wrap(%>%Cocoa.Percent)
  1. Scenario 2, continued

Your team has created some basic visualizations to explore different aspects of the chocolate bar data. You’ve volunteered to add titles to the plots. You begin with a scatterplot.

Assume the first part of your code chunk is:

ggplot(data = trimmed_flavors_df) +

geom_point(mapping = aes(x = Cocoa.Percent, y = Rating)) +

What code chunk do you add to the third line to add the title Recommended Bars to your plot?

  • labs(title = “Recommended Bars”)
  • labs(title = Recommended Bars)
  • labs(“Recommended Bars”)
  • labs(title + “Recommended Bars”)
  1. Scenario 2, continued

Next, you create a new scatterplot to explore the relationship between different variables. You want to save your plot so you can access it later on. You know that the ggsave() function defaults to saving the last plot that you displayed in RStudio, so you’re ready to write the code to save your scatterplot.

Assume your first two lines of code are:

ggplot(data = trimmed_flavors_df) +

geom_point(mapping = aes(x = Cocoa.Percent, y = Rating)) +

What code chunk do you add to the third line to save your plot as a jpeg file with chocolate as the file name?

  • ggsave(“chocolate.jpeg”)
  • ggsave(“chocolate.png”)
  • ggsave(“jpeg.chocolate”)
  • ggsave(chocolate.jpeg)
  1. Scenario 2, continued

As a final step in the analysis process, you create a report to document and share your work. Before you share your work with the management team at Chocolate and Tea, you are going to meet with your team and get feedback. Your team wants the documentation to include all your code and display all your visualizations.

You decide to create an R Markdown notebook to document your work. What are your reasons for choosing an R Markdown notebook? Select all that apply.

  • It lets you record and share every step of your analysis
  • It allows users to run your code
  • It automatically creates a website to show your work
  • It displays your data visualizations

Shuffle Q/A

  1. Scenario 1, questions 1-7

As part of the data science team at Gourmet Analytics, you use data analytics to advise companies in the food industry. You clean, organize, and visualize data to arrive at insights that will benefit your clients. As a member of a collaborative team, sharing your analysis with others is an important part of your job.

Your current client is Chocolate and Tea, an up-and-coming chain of cafes.

The eatery combines an extensive menu of fine teas with chocolate bars from around the world. Their diverse selection includes everything from plantain milk chocolate, to tangerine white chocolate, to dark chocolate with pistachio and fig. The encyclopedic list of chocolate bars is the basis of Chocolate and Tea’s brand appeal. Chocolate bar sales are the main driver of revenue.

Chocolate and Tea aims to serve chocolate bars that are highly rated by professional critics. They also continually adjust the menu to make sure it reflects the global diversity of chocolate production. The management team regularly updates the chocolate bar list in order to align with the latest ratings and to ensure that the list contains bars from a variety of countries.

They’ve asked you to collect and analyze data on the latest chocolate ratings. In particular, they’d like to know which countries produce the highest-rated bars of super dark chocolate (a high percentage of cocoa). This data will help them create their next chocolate bar menu.

Your team has received a dataset that features the latest ratings for thousands of chocolates from around the world. Click here to access the dataset. Given the data and the nature of the work you will do for your client, your team agrees to use R for this project.

Your supervisor asks you to write a short summary of the benefits of using R for the project. Which of the following benefits would you include in your summary? Select all that apply.

  • Quickly process lots of data
  • Create high-quality data visualizations
  • Define a problem and ask the right questions
  • Easily reproduce and share the analysis
  1. Scenario 1, continued

Before you begin working with your data, you need to import it and save it as a data frame. To get started, you open your RStudio workspace and load the tidyverse library. You upload a .csv file containing the data to RStudio and store it in a project folder named flavors_of_cacao.csv.

You use the read_csv() function to import the data from the .csv file. Assume that the name of the data frame is bars_df and the .csv file is in the working directory. What code chunk lets you create the data frame?

  • bars_df %>% read_csv(“flavors_of_cacao.csv”)
  • read_csv(“flavors_of_cacao.csv”) + bars_df
  • bars_df <- read_csv(“flavors_of_cacao.csv”)
  • bars_df + read_csv(“flavors_of_cacao.csv”)
  1. Scenario 1, continued

Now that you’ve created a data frame, you want to find out more about how the data is organized. The data frame has hundreds of rows and lots of columns.

Assume the name of your data frame is flavors_df. What code chunk lets you review the structure of the data frame?

  • filter(flavors_df)
  • str(flavors_df)
  • select(flavors_df)
  • summarize(flavors_df)
  1. Scenario 1, continued

Next, you begin to clean your data. When you check out the column headings in your data frame you notice that the first column is named Company...Maker.if.known. (Note: The period after known is part of the variable name.) For the sake of clarity and consistency, you decide to rename this column Brand (without a period at the end).

Assume the first part of your code chunk is:

flavors_df %>%

What code chunk do you add to change the column name?

  • rename(Brand = Company…Maker.if.known.)
  • rename(Company…Maker.if.known. = Brand)
  • rename(Company…Maker.if.known. , Brand)
  • rename(Brand, Company…Maker.if.known.)
  1. After previewing and cleaning your data, you determine what variables are most relevant to your analysis. Your main focus is on Rating, Cocoa.Percent, and Company.Location. You decide to use the select() function to create a new data frame with only these three variables.

Assume the first part of your code is:

trimmed_flavors_df <- flavors_df %>%

Add the code chunk that lets you select the three variables.

What company location appears in row 1 of your tibble?

  • Scotland
  • Canada
  • Colombia
  • France
  1. After completing your analysis of the rating system, you determine that any rating greater than or equal to 3.75 points can be considered a high rating. You also know that Chocolate and Tea considers a bar to be super dark chocolate if the bar's cocoa percentage is greater than or equal to 80%. You decide to create a new data frame to find out which chocolate bars meet these two conditions.

Assume the first part of your code is:

best_trimmed_flavors_df <- trimmed_flavors_df %>%

You want to apply the filter() function to the variables Cocoa.Percent and Rating. Add the code chunk that lets you filter the new data frame for chocolate bars that contain at least 80% cocoa and have a rating of at least 3.75 points.

How many rows does your tibble include?

  • 22
  • 20
  • 12
  • 8
  1. Scenario 2, continued

A teammate creates a new plot based on the chocolate bar data. The teammate asks you to make some revisions to their code.

Assume your teammate shares the following code chunk:

ggplot(data = best_trimmed_flavors_df) +

geom_bar(mapping = aes(x = Company)) +

What code chunk do you add to the third line to create wrap around facets of the variable Company?

  • facet(Company)
  • facet_wrap(+Company)
  • facet_wrap(~Company)
  • facet_wrap(=Company)
  1. Scenario 2, continued

Your team has created some basic visualizations to explore different aspects of the chocolate bar data. You’ve volunteered to add titles to the plots. You begin with a scatterplot.

Assume the first part of your code chunk is:

ggplot(data = trimmed_flavors_df) +

geom_point(mapping = aes(x = Cocoa.Percent, y = Rating)) +

What code chunk do you add to the third line to add the title Suggested Chocolate to your plot?

  • labs(title = “Suggested Chocolate”)
  • labs(Suggested Chocolate = title)
  • labs(Suggested Chocolate)
  • labs <- “Suggested Chocolate”
  1. Scenario 2, continued

Next, you create a new scatterplot to explore the relationship between different variables. You want to save your plot so you can access it later on. You know that the ggsave() function defaults to saving the last plot that you displayed in RStudio, so you’re ready to write the code to save your scatterplot.

Assume your first two lines of code are:

ggplot(data = trimmed_flavors_df) +

geom_point(mapping = aes(x = Cocoa.Percent, y = Rating)) +

What code chunk do you add to the third line to save your plot as a pdf file with “chocolate” as the file name?

  • ggsave(“chocolate.png”)
  • ggsave(“chocolate.pdf”)
  • ggsave(“pdf.chocolate”)
  • ggsave(chocolate.pdf)
  1. Scenario 2, continued

As a final step in the analysis process, you create a report to document and share your work. Before you share your work with the management team at Chocolate and Tea, you are going to meet with your team and get feedback. Your team wants the documentation to include all your code and display all your visualizations.

Fill in the blank: You want to record and share every step of your analysis, let teammates run your code, and display your visualizations. You decide to create _____ to document your work.

  • a database
  • a spreadsheet
  • an R Markdown notebook
  • a data frame
  1. Scenario 1, continued

Before you begin working with your data, you need to import it and save it as a data frame. To get started, you open your RStudio workspace and load the tidyverse library. You upload a .csv file containing the data to RStudio and store it in a project folder named flavors_of_cacao.csv.

You use the read_csv() function to import the data from the .csv file. Assume that the name of the data frame is chocolate_df and the .csv file is in the working directory. What code chunk lets you create the data frame?

  • read_csv(“flavors_of_cacao.csv”) + chocolate_df
  • chocolate_df <- “flavors_of_cacao.csv”(read_csv)
  • chocolate_df <-read_csv(“flavors_of_cacao.csv”)
  • chocolate_df + read_csv(“flavors_of_cacao.csv”)
  • Save, organize, and document code
  • Create a record of your cleaning process
  • Perform calculations for analysis more efficiently
  • Generate a report with executable code chunks
  1. Scenario 1, continued

Next, you begin to clean your data. When you check out the column headings in your data frame you notice that the first column is named Company...Maker.if.known. (Note: The period after known is part of the variable name.) For the sake of clarity and consistency, you decide to rename this column Company (without a period at the end).

Assume the first part of your code chunk is:

flavors_df %>%

What code chunk do you add to change the column name?

  • rename(Company = Company…Maker.if.known.)
  • rename(Company…Maker.if.known. <- Company)
  • rename(Company…Maker.if.known. = Company)
  • rename(Company <- Company…Maker.if.known.)
  • Save, organize, and document code
  • Create a record of your cleaning process
  • Perform calculations for analysis more efficiently
  • Generate a report with executable code chunks
  1. Scenario 2, continued

As a final step in the analysis process, you create a report to document and share your work. Before you share your work with the management team at Chocolate and Tea, you are going to meet with your team and get feedback. Your team wants the documentation to include all your code and display all your visualizations.

You want to record and share every step of your analysis, let teammates run your code, and display your visualizations. What do you use to document your work?

  • A database
  • A spreadsheet
  • A data frame
  • An R Markdown notebook
  1. Next, you select the basic statistics that can help your team better understand the ratings system in your data.

Assume the first part of your code is:

trimmed_flavors_df %>%

You want to use the summarize() and mean() functions to find the mean rating for your data. Add the code chunk that lets you find the mean value for the variable Rating.

What is the mean rating?

  • 3.995445
  • 3.185933
  • 4.701337
  • 4.230765
  1. Now that you’ve cleaned and organized your data, you’re ready to create some useful data visualizations. Your team assigns you the task of creating a series of visualizations based on requests from the Chocolate and Tea management team. You decide to use ggplot2 to create your visuals.

Assume your first line of code is:

ggplot(data = best_trimmed_flavors_df) +

You want to use the geom_bar() function to create a bar chart. Add the code chunk that lets you create a bar chart with the variable Company on the x-axis.

How many bars does your bar chart display?

  • 6
  • 4
  • 8
  • 10

Course 8 - Google Data Analytics Capstone: Complete a Case Study

Week 1 – Learn about capstone basics

  1. Test your knowledge on professional case studies
  • portfolio
  • capstone
  • personal website
  • problem statement
  1. Which of the following are important strategies when completing a case study? Select all that apply.
  • Communicate the assumptions you made about the data
  • Use a programming language
  • Document the steps you’ve taken to reach your conclusion
  • Answer the question being asked
  1. To successfully complete a case study, your answer to the question the case study asks has to be perfect.
  • True
  • False
  1. Which of the following are qualities of the best portfolios for a junior data analyst? Select all that apply.
  • Personal
  • Unique
  • Large
  • Simple
  1. Which of the following are places where you can store and share your portfolio? Select all that apply.
  • Tableau
  • RStudio
  • GitHub
  • Kaggle

Shuffle Q/A

  1. Fill in the blank: A _____ is a collection of case studies that you can share with potential employers.
  • portfolio
  • capstone
  • problem statement
  • personal website

Week 3 – Optional: Using your portfolio

  1. An elevator pitch gives potential employers a quick, high-level understanding of your professional experience. What are the key considerations when creating an elevator pitch? Select all that apply.
  • Focus on your process over the results
  • Consider your audience’s interests
  • Keep it fresh by not over-practicing it
  • Make sure it’s short enough that it can be explained to someone during an elevator ride
  1. What are the key purposes of discussing a case study during an interview? Select all that apply.
  • Outline your thinking about a data analytics scenario for your interviewer
  • Ask your potential employer questions about the company
  • Negotiate a fair salary for the position
  • Recommend real-world solutions based on your own work
  1. If an interviewer says, “Tell me about yourself,” it’s important to limit your response to topics related to data analytics.
  • True
  • False
  1. During an interview, you will likely respond to technical questions, practical knowledge questions, and questions about your personal experiences. What strategies can help you prepare to respond effectively? Select all that apply.
  • Copy real-world examples from more experienced professionals to include in your responses
  • Write down your answers to common questions
  • Practice your responses until they feel natural and unrehearsed
  • Brainstorm examples from your own experiences that support your answers
  1. Imagine that an interviewer asks, “How do you maintain data integrity?” What topics does this question give you the opportunity to discuss? Select all that apply.
  • The reasons you strongly preference SQL over spreadsheets for data cleaning
  • The impact that issues with your data can have on business decisions
  • The methods you would use for error checking and data validation
  • The importance of reliability and accuracy in good data analysis

Week 4

Did you complete a case study?

We hope you were excited about the opportunity to complete an optional case study in this course. It's a great way to showcase your new data analytics skills to potential employers.

Please let us know whether or not you completed a case study; you’ll be able to proceed with the course either way!

  • Yes, I completed a case study.
  • No, I skipped the case study.

More certification answers our continuously expanding library in CertificationAnswers.com

Google Data Analytics Professional Certificate Answers - Coursera (2024)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Twana Towne Ret

Last Updated:

Views: 5313

Rating: 4.3 / 5 (44 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Twana Towne Ret

Birthday: 1994-03-19

Address: Apt. 990 97439 Corwin Motorway, Port Eliseoburgh, NM 99144-2618

Phone: +5958753152963

Job: National Specialist

Hobby: Kayaking, Photography, Skydiving, Embroidery, Leather crafting, Orienteering, Cooking

Introduction: My name is Twana Towne Ret, I am a famous, talented, joyous, perfect, powerful, inquisitive, lovely person who loves writing and wants to share my knowledge and understanding with you.