Anonymous — the dedup module mirrored our Friday export chaos; still wish the pandas appendix had one more video.
Data Cleanup
CSV Rescue Lab: Cleanup Without Clicking
Normalize messy exports, detect duplicate keys, and emit a reconciliation-ready workbook.
Program narrative
Automation should feel calmer than manual clicking. You will profile messy CSVs, standardize headers, and emit annotated workbooks that highlight rows needing human eyes. The capstone mirrors a real operations desk handoff.
Inside the bundle
- Profiling notebook with histograms for string lengths
- Dedup strategies that preserve the newest row automatically
- Annotation columns that explain why a row was flagged
- Packaging the flow as a CLI your team can schedule
- Mentor session on communicating findings to stakeholders
- Template for archiving raw files before transforms
- Optional pandas track for teams already using notebooks
Outcomes we expect you to evidence
- Deliver a CLI that outputs flagged rows with reasons
- Document the cleanup assumptions your team approved
- Run the lab capstone on a masked sample from your desk
Mentor anchor
Data hygiene specialist who ships guardrails for spreadsheet-heavy teams.
Otávio Mendes
Primary reviewer for this program track.
Participant questions
No. The course targets pre-BI cleanup and reconciliation prep; dashboards stay in your existing stack.
Use masked samples inside the lab. Mentors never require production files.
We do not cover streaming pipelines; focus stays on batch CSV workflows under a few million rows.
Recent peer notes
Mentor notes on my CSV Rescue Lab CLI were blunt in a useful way — rewrote two functions same night.
Flag column wording now travels with every file I send clients.