Salary of Data-related jobs: First Graph
- Open you excel file in Tableau
- In the bottom section you can update the type of variable (number, string, date, etc)
Tableau splits up data into Measures and Dimensions
- Measures: Continuous variables
- Dimensions: Discrete variables
- * Some variables can be continuous or discrete, measure or dimension. Sometimes you need to change it in tableau to align with your analysis.
- Drag the variables into the desired section
- Variable drop down – You can rename the variable, change data type, etc
- You can search through your variables via the magnifying glass
Cards: Different sections in Tableau (Pages, filters, Marks, Show me)
Shelves: Rows, Columns
1st Question: How much are we likely to make in different data related jobs?
- Dependent Variable: Paid wage per year (Rows)
- Independent variable: Job title subgroup (Column)
Use the drop down on each variable to confirm it is calculating the value you want (SUM, Average, Median, Etc)
Formatting and Exporting Graphs:
- Rearrange columns: Hover over label and click on bar group icon, one click reorganizes data from largest to smallest, two clicks smallest to largest, three clicks back to original.
- Change vertical bar group to horizontal: Click on upside down L looking symbol “swap”
- Edit text: Right click on the axis and click “format”
- Change Color, Size, or label: Click on respective label within the marks card
- To add Title right click and select “Title”
How to export graph:
- Worksheet –> Export –>Image
- Select what you would like to export
- Give title
- Print to PDF
Digging Deeper Using Rows and Columns Shelves
Hint – Click on drop down on variable to change default aggregation for each variable so you don’t always have to change later (ex. from sum to median)
You can add a second graph by clicking on the “worksheet” title at the bottom of the screen, and copying and pasting.
To find out more about the data, right click on a bar and select “view data”. The underlying data will show you the source data where the value came from.
There are some things that you can only determine by looking through the raw data. However Tableau can visualize a ton of data much faster and give great insights.
Tooltip – Shows certain variables when you hoover over a bar. You can manually add a variable by dropping the variable on “Detail” on the Marks card.
You can add the dependent variable back to Rows but select “Standard Deviation” to show the spread of values. If the standard deviation is very high than it is likely their are some outliers. Outliers can have dramatic impact on the data analysis.
If you notice outliers, add to data analysis plan for later investigation.
If hold the shift key you can drag the variable directly from the rows column to the marks card in order to have the description appear when you hover over the bar
Understanding the Marks Card
The marks card generally has two overall functions
- If you are using an area chart (pie, bubble, etc), the marks card will allow you to determine which variable is associated to that area.
- Define anything you cannot define in the rows or the columns.
Removing Outliers Using Scatterplot and Filtering and Groups
Outliers can usually be indicated by a very large standard deviation. From our data we can see that “attorney’s” have by the far the largest SD, and is really the only occupation with a significant SD.
Scatterplots can help us identify if outliers exist.
- Navigate to Analysis–> Uncheck “Aggregate rows”
- This will show you a separate point for each data point instead of creating a clean bar graph
- Change the size on the marks card to make the graph easier to read
- The graph will now show if there are true outliers
- To view the detail, right click on the data point and click “view data”
There are two ways to view the graph without the outlier data:
- Use a filter to prevent the graph from showing data above or below a certain value:
- Drag the data variable to “Filter” Card and select the values you want to view.
- If you right click on the filter and select, “show filter” the dynamic filter menu will appear on the right and you can now adjust as you please
- Group the outliers into their own group and use filter
- Add a unique data variable to the marks card “Detail” in our case, “case number”
- Highlight the data points you want to group, right click, and select group by case number
- Case number (group) is now its own data variable on the left toolbar.
- You can now drag the variable to the “filter” tab and filter as such.
To investigate your outliers you can now use your subgroup as its own graph. Add the subgroup variable as a column and filter for your job title of interest.
Analyzing Salary data Across States Utilizing Filters and Groups
Once you group your particular headers, click “include other” to group everything you did not group in a “other” bucket.
To make the chart stand out more, drag the variable you would like to color to the color marks card
You can also rearrange the job title subgroups by simply dragging and dropping them in a different order on the left hand side.
When to use line Graphs
- Show how things change over time
- Show how two continuous variables are related to each other
Tableau can use date as a continuous or discrete variable
Dates as Hierarchical Dimensions or Measures
Are data analyst jobs increasing with time?
Tableau defaults dates in a hierarchy when they are used as a dimension (discrete variable)
- (Week #)
This is customizeable if you so choose!
To create your own hierachy you drag one variable into the other.
To change date variable from being a dimension to a measure, click the arrow to bring up the drop down and select your level of specificity from the second block of options:
Analyzing Data-Related Salaries Over Time Using Date Hierarchies
In order to determine if salaries are changing over time by job title, we need to add the “Job title subgroup” to the graph. We cannot add it to column or rows so we add it to the color section of the marks card.
Since not all jobs are represented in 2008 we should put in a filter to remove 2008 from the graph.
You can use the “highlight” selected items tool to make the graph more visually appealing.
Since most of the lines seem flat we can use a scatter plot to dig a little deeper.
Navigate to the “Analysis” drop down and deselect Aggregate measures. Based on the results it looks like the highest and the lowest values are moving in their respective questions.
By looking at how the maximum and minimum salaries change over time we can see that both are actually increasing and decreasing respectively.
Analyzing Data-Related Salaries Over Time Using Trend Lines
Help narrow down what statistical models are worth your time.
Regression: Places a best fit (trend line) through the data points on the graph. Before adding the trend line we need to convert our line graph to circles.
Right click on graph and select “show trend line”
The P value can indicate whether or not the results are significant, how reliable they are, and if you should trust them.
P<.05 significant means 5% errors due to chance
P<.01 significant means 1% errors due to chance
Analyzing Data-Related Salaries Over Time Using Box Plots
If you treat dates as discrete variables you can use box plots.
Box plots are great for analyzing data but terrible for showing to your stakeholders.
The center line =median
50% of all data points are between the top and bottom lines, 25% is about the top, 25% is below bottom. Outliers are beyond the other indicators.