Start With the Main Constraint
Decide what the state salary table is for before touching the outliers. A cleaning job, a comparison job, and a negotiation job do not use the same standard.
For a reporting table, the goal is accuracy and readability. For a relocation or salary-planning table, the goal is to show the middle of the market without letting a few extreme records rewrite the story. For an offer review, state rank matters less than occupation, level, and compensation structure.
State labels hide three common distortions: metro concentration, job-family mix, and public-sector share. A state with one dominant city does not behave like a state with a spread-out labor market. A state with lots of entry-level government roles does not compare cleanly to one with dense private-sector tech or finance pay.
The Comparison Points That Actually Matter
Use the metric that matches the decision, not the one that looks cleanest in a chart. Salary data is skewed, so the mean shifts fast when high earners sit in a small sample.
IQR, the distance between the 25th and 75th percentiles, works better than a z-score screen for state salary tables. Z-scores assume a shape salary data rarely has. Percentiles show how wide the middle of the market runs, which matters more than a polished average.
| Signal | What it usually means | Action |
|---|---|---|
| More than 1.5×IQR from the median | Review candidate, not automatic error | Check role, level, pay type, and location |
| More than 3×IQR from the median | Strong tail value or bad entry | Verify source fields, then keep or remove with a note |
| Mean is 20% or more above the median | High-end skew | Report median plus 25th and 75th percentiles |
| Fewer than 30 observations | State ranking is unstable | Combine years, widen the region, or stop ranking the state alone |
| One metro drives most of the sample | State average hides local pay | Split metro and non-metro values |
Use this table for state summaries, not raw audit records. Audit work needs the original values, because the tail tells you where the data broke, where the market paid up, and where the job mix changed.
The Compromise to Understand
Cleaner summaries sacrifice detail. Raw tables preserve detail but turn every salary chart into a noise problem.
Trimming outliers cuts visual clutter. It also hides the top end of the market, which matters if the state has a thin but real premium tier. Winsorizing keeps every record in the set, then caps the extremes. That helps dashboards stay readable, but it blurs the difference between a bad entry and a genuine high-pay role.
Use three layers instead of one blunt rule. Keep raw records for verification. Use trimmed or winsorized figures for internal dashboards. Use the median and percentile bands for career planning. That setup takes more work up front, but it reduces the chance that one extreme record rewrites the state story.
A state salary table needs regular refresh because labor mix changes faster than a state code does. A multi-year average smooths spikes, but it also delays the signal when a major employer expands, contracts, or changes pay bands.
The First Decision Filter for Salary by State
Run the outlier through a simple order of checks before you label it as good, bad, or irrelevant.
- Impossible value: Drop it if the number is clearly wrong, such as a misplaced decimal, a doubled annualization, or a pay figure that ignores the unit.
- Role mismatch: Split it out if the record belongs to a different occupation, seniority level, or employment type.
- Compensation mismatch: Separate base salary from bonus-heavy total comp.
- Geography mismatch: Pull remote jobs, metro jobs, and non-metro jobs into separate views.
- Legitimate tail: Keep it, but label it as a tail value and do not let it set the state average.
This filter matters because many salary outliers are data structure problems, not market surprises. An hourly role converted to annual pay creates a fake high or low point. A remote role with employer-based pay creates a geography mismatch. A stock-heavy total comp record makes a state look richer than its base salary market actually is.
The fast test is simple: if the point changes the category, it is not a summary statistic yet. It needs another segment.
What This Looks Like in Practice
Treat the outlier as a clue, not a verdict. The best response changes with the pattern behind it.
- A single very high record in a state-wide tech table: Keep the record if the role and level match the rest of the dataset. Do not let it lift the state median. Compare the job family separately.
- A low record that came from hourly pay annualized by mistake: Fix the conversion before using the number anywhere else. A bad unit does not belong in a salary comparison at all.
- A state table with a large public-sector share: Separate public and private pay before summarizing. The combined number hides the actual market and makes the outliers look bigger than they are.
- A state dominated by one coastal metro: Use metro and non-metro views. The state average will track the metro, not the full labor market.
The key distinction is simple. If the outlier belongs to a different subgroup, segment it. If it belongs to the same subgroup, keep it and report the middle of the distribution alongside it.
Constraints You Should Check
Verify the structure of the data before you decide what an outlier means. These six checks change the answer fast:
- Role family: Engineering, nursing, operations, and sales do not belong in the same state average.
- Seniority level: Entry-level and senior-level pay live in different bands.
- Pay type: Base salary, bonus, commission, and total compensation need separate handling.
- Work arrangement: Remote, hybrid, and on-site records do not tell the same geography story.
- Location rule: Some sources use worker residence, others use employer location, and some use both in mixed ways.
- Observation count: Below 30, the state summary is thin enough that one extreme record distorts the result.
Cost of living belongs after this cleanup, not before it. If the data is mixed or wrong at the source level, a cost-of-living adjustment only rescales the noise. It does not fix the structure.
When Another Path Makes More Sense
Use a different path when the state view is too broad for the decision in front of you. A state salary table works for screening, not for final judgment.
Choose occupation-by-state data when the question is where a specific role pays best. Choose metro-level data when the job search centers on one region. Choose percentile bands when the goal is salary negotiation. Choose multi-year averages only when the sample is thin and you need stability more than immediacy.
A single state number does one job well: it shows broad positioning. It does not show offer quality, niche role value, or the difference between a city core and the rest of the state. If the decision hinges on any of those, a state-wide average misses too much.
Decision Checklist
Use this before publishing or relying on a state salary summary:
- The outlier passed the 1.5×IQR review.
- The record belongs to the same occupation and level as the rest of the table.
- Base pay and total compensation are separated.
- Remote and on-site records are not blended without a note.
- The sample has at least 30 observations.
- One metro does not dominate the entire state view.
- The median and percentile bands are visible, not hidden behind one average.
- Every extreme point is tagged as error, mismatch, or legitimate tail.
If two or more boxes stay unchecked, do not use the state average as the headline number.
Common Mistakes to Avoid
Do not delete every high salary point. That erases real premium markets and leaves only the middle of the pack. Do not keep every extreme point either. That turns a state summary into a pile of unrelated records.
Do not mix base salary with total compensation. A bonus-heavy record belongs in a different view from a straight salary record. Do not rank states by mean alone. The mean rises quickly when the top end is crowded.
Do not use one metro as a proxy for the whole state. That mistake turns geography into a pay illusion. Do not keep a state table stale for too long. Hiring mix shifts, job levels shift, and the outlier pattern shifts with them.
The clean-up rule is blunt: if the state summary depends on one unusually high or low record, the summary is not ready.
The Bottom Line
Keep legitimate tail values, remove data errors, split mixed groups, and use median plus percentile bands for state-level salary decisions. That approach preserves the real market without letting one record hijack the result.
For career planning, the safest state comparison is the one that shows the middle clearly and leaves the tail visible as a separate signal.
Frequently Asked Questions
Should I delete a salary outlier from a state table?
Delete it only when the record is clearly wrong, such as a bad unit, a decimal error, or a mismatched role. Keep it when the pay fits the job, then summarize the state with the median and percentile bands.
Is the mean or median better for salary by state?
The median is better. The mean shifts upward fast when a few high earners sit in the sample, and it shifts downward when a state has a lot of entry-level records. Use the mean as a secondary check, not the headline.
How many observations do I need before I trust a state salary summary?
Thirty observations is the minimum floor for a useful state summary. Below that, one extreme point changes the picture too much, so combine years, widen the geography, or move to occupation-level data.
What should I do when remote jobs create outliers?
Separate remote jobs before you judge the outlier. Remote pay follows employer policy and job level, not just the worker’s location, so blending it with on-site state records hides the real pattern.
Should cost-of-living adjustment come before outlier cleanup?
No. Clean the data first, then apply cost-of-living adjustment if the goal is comparison across places. Adjustment does not fix bad entries, mixed job levels, or compensation fields that were merged together.
What if one metro dominates the state salary data?
Split the metro from the rest of the state. The combined number hides the actual pay structure and turns a geography issue into a salary issue.
When does a high salary point count as a real signal?
It counts as a real signal when the role, level, and compensation type match the rest of the dataset and the point survives the 3×IQR review. At that point, keep it as a tail value and do not let it define the whole state.
Should I use trimmed or winsorized averages for salary by state?
Use trimmed or winsorized averages for dashboards and internal summaries. Use the median for career planning and negotiation, because it shows the center of the market without overreacting to the tail.