Cohort Analysis
A case study showing how I would analyze retention, activation, LTV, sales performance, and churn risk in a growth-stage B2B SaaS business.
Overview
This project uses a synthetic dataset for a fictional B2B workflow automation company.
The product is similar to Zapier, Workato, or n8n: teams use it to connect tools, automate repetitive work, and build no-code workflows.
The dataset covers 24 months of activity across 3,000 customers in SMB, Mid-Market, and Enterprise segments. Each segment has different pricing, retention behavior, activation patterns, and unit economics.
The goal is not just to show charts. The goal is to connect operating data to decisions: where the business is healthy, where the product is leaking, which segments are most valuable, which reps need attention, and which customers should be prioritized for retention.
What I built
The project runs end to end:
- synthetic B2B SaaS dataset
- Postgres schema and SQL analysis
- notebook exploration
- Google Sheets export
- Looker Studio dashboard
- custom Next.js dashboard
The Looker dashboard shows the analysis in a standard BI tool. The Next.js version rebuilds the same analysis with a stronger narrative flow and a more polished product-style interface.
The business questions
The analysis focuses on five operating questions:
- What does retention look like by cohort and segment?
- Where does the activation funnel break?
- Are the segments economically healthy?
- How are sales reps performing this quarter?
- Which active customers are most at risk of churning?
Key findings
1. Retention
Retention behaves as expected by segment.
SMB customers churn fastest, Mid-Market customers retain longer, and Enterprise customers are the stickiest. SMB average retention falls below 50% by month 14. Mid-Market retention lands at 56% by month 24. Enterprise retention stays above 70% across the full 24-month window.
Next action: investigate the SMB drop-off between months 6 and 14 using churn-reason analysis, exit interviews, and product-usage comparisons between retained and churned SMB customers.
2. Activation
Only 242 of 3,000 signups activate, an 8.1% activation rate.
The biggest issue is the final step in the funnel: workflow built → activated. Customers are willing to create a workflow, but many do not return to use it repeatedly enough to suggest long-term retention.
Earlier drop-offs are expected in B2B SaaS. The workflow-to-activation cliff is the important product issue.
Next action: review the first-workflow experience and identify what prevents repeated usage. Focus on setup friction, workflow quality, integrations, user education, and whether customers reach a clear first “aha” moment.
3. LTV
All three segments clear a 3x LTV/CAC threshold, but Enterprise is the clear standout.
Enterprise customers have much higher lifetime value and stronger retention, which means the business should consider reweighting acquisition resources toward Enterprise while continuing to test whether SMB acquisition remains attractive after churn.
Next action: shift more new-business investment toward Enterprise and diagnose whether SMB acquisition is still worth the retention profile.
4. Sales performance
Seven of ten reps cleared quota.
Enterprise reps materially outperformed, partly due to large deals closing in-quarter. SMB underperformance is more interesting because it is spread across multiple reps, suggesting the issue may be structural rather than individual.
Next action: audit SMB territory assignments and pipeline coverage to determine whether underperformance is driven by territory mix, lead quality, or top-of-funnel volume.
5. Churn risk
Of 1,973 active customers, 157 sit in high or severe risk bands. 43 are in the severe band.
The top 20 severe-risk customers are the immediate retention priority. The remaining severe-risk customers should be worked through in the next cycle.
Next action: Customer Success should contact all 43 severe-risk customers within 14 days, starting with the top 20. Reps should log the churn reason cited by each customer so the model and playbook can be refined.
Technical notes
- Data generation: Python, Faker, NumPy, pandas
- Database: Supabase / Postgres
- SQL: CTEs, window functions, cohort logic, conditional aggregation, and percentile calculations
- Analysis: Jupyter notebook
- Dashboarding: Google Sheets → Looker Studio, plus a custom Next.js dashboard
- Frontend: Next.js dashboard built to present the analysis with a clearer narrative and more polished UI