Project structure overview of retail analytics

Shared from "Retail-analytics-platform" on Inkdown

title: "Analytics Major project" source: "https://claude.ai/chat/cd6ebc5f-fbf1-40d5-aa3e-f7512a12062c" author: published: created: 2026-05-25 description: "Claude conversation with 12 messages"

You

I said it must be end to end data analytics project not data science project, just go through this context and check does this project use data science and ML tech stack at its core or is it that data analytics roles also want this tech stack in their JD??? if yes then, create an artifact explaining each phases and steps to complete each phase along with git commit along with each phase and also explain where to focus on more new core concepts during each phase.

Claude

First, let me directly answer your concern, then build the artifact.

Honest verdict on the project:

Phase 1–3 (Architecture, dbt, SQL) and Phase 6–7 (Dashboard, Production hardening) — This is exactly what DA JDs ask for.

Phase 2 Transformation with dbt Week 2 Pure Data Analytics Phase goal: Turn messy raw data into clean, business-ready tables. dbt is how modern analytics teams do this — in SQL, with version control, with tests. This layer is what your dashboard and SQL queries will read from. Tools used dbt CoreDuckDB adapterSQLYAML Steps 1 Install dbt Core with DuckDB adapter pip install dbt-duckdb. Run dbt init retail_intelligence inside the dbt/ folder. Point profiles.yml to your warehouse/retail_warehouse.db file. 2 Build staging models — dbt/models/staging/ stg_orders.sql: rename columns to snake_case, cast InvoiceDate to DATE, filter out cancelled orders (InvoiceNo starting with C). stg_events.sql: parse timestamp, map event codes to labels. stg_marketing.sql: standardise channel names. 3 Build intermediate models — dbt/models/intermediate/ int_orders_with_revenue.sql: add calculated column TotalRevenue = Quantity × UnitPrice. int_customer_orders.sql: aggregate orders per customer (order count, total spend, first/last order date). 4 Build mart models — dbt/models/marts/ dim_customer.sql, dim_product.sql, dim_date.sql, fact_orders.sql, fact_events.sql, fact_marketing.sql. These are your final analytics-ready tables. 5 Write dbt schema tests in schema.yml Every key column: not_null, unique, accepted_values. Example: InvoiceNo not_null, event_type accepted_values [view, addtocart, transaction]. Write one custom singular test: assert no row has Quantity < 0 in fact_orders. 6 Run dbt docs generate — screenshot the lineage DAG dbt docs generate && dbt docs serve. Screenshot the full DAG showing raw → staging → intermediate → mart flow. This goes in your README. Git commits for this phase featdbt init — project setup with DuckDB adapter featstaging models — orders, events, marketing featintermediate models — revenue calc, customer aggregates featmart models — all dim and fact tables testdbt schema tests — not_null, unique, custom singular docsdbt lineage DAG screenshot added to README Where to focus — new concepts this phase Staging → intermediate → mart pattern Staging = clean, typed, renamed. Intermediate = business logic joins. Mart = final consumer tables. Each layer has one job. Data quality testing in YAML Declarative tests over SQL assertions. Ask: what would silently break my dashboard if this column had nulls? Data lineage Being able to trace any KPI back to its source table. This is what dbt docs generates. Show it in every interview. SQL modularity with ref() {{ ref('stg_orders') }} instead of hardcoded table names. dbt rebuilds in the correct order automatically.

Phase 2 done now do it similar for phase 3 and maintain proper step execution along with github commits at respective checkpoints and stages of this phase 3: Phase 3 Advanced SQL Analytics Week 3–4 Pure Data Analytics Phase goal: Answer the business questions that management cannot currently answer. Every query here maps to a real business decision. This is the most important phase for interviews — advanced SQL is tested in virtually every DA hiring process. Tools used DuckDB SQLCTEsWindow functionsDBeaver Steps 1 sql/kpi_summary.sql — Core business KPIs Total Revenue, Total Orders, AOV (Revenue/Orders), Return Rate (cancelled invoices / total), Unique Customers, Average Order Frequency. Run monthly and overall. 2 sql/cohort_retention.sql — Cohort Retention Matrix DATE_TRUNC to assign cohort month from first purchase. Window function MIN(InvoiceDate) per customer. Calculate months since first purchase. Pivot to 12-month matrix. This is the hardest query — spend two days on it if needed. 3 sql/rfm_segmentation.sql — RFM Segmentation Recency = days since last order. Frequency = distinct invoice count. Monetary = total spend. NTILE(5) on each. Concatenate R+F+M scores. Label: Champions, Loyal, At Risk, Lost. 4 sql/funnel_analysis.sql — Conversion Funnel From fact_events: count distinct visitors at view → addtocart → transaction. Calculate drop-off % at each stage. Find the biggest leak in the funnel. 5 sql/revenue_by_segment.sql — Product & Regional Analysis Revenue by product category, revenue by country/region, top-10 products by revenue, top-10 products by return rate. Month-over-month revenue trend using LAG(). 6 sql/marketing_performance.sql — Channel Efficiency Revenue per channel, ROAS by channel, CPA by channel, month-over-month ROAS trend. Rank channels by efficiency using RANK() window function. 7 sql/ab_test_analysis.sql — A/B Test in SQL Simulate two groups by hashing visitorid. Calculate conversion rate per group. Compute z-score and p-value in pure SQL. This is advanced and impressive — do it last. Git commits for this phase featsql — core KPI summary (revenue, AOV, returns) featsql — cohort retention matrix 12-month window featsql — RFM segmentation with NTILE(5) and labels featsql — conversion funnel drop-off analysis featsql — product and regional revenue breakdown featsql — marketing channel efficiency with ROAS featsql — A/B test z-score in pure SQL Where to focus — new concepts this phase Window functions — master these ROW_NUMBER, RANK, NTILE, LAG, LEAD, FIRST_VALUE. The cohort query alone uses four of these. These are tested in every DA interview. Cohort analysis logic The most-asked advanced SQL concept in DA interviews. The DATE_TRUNC + MIN(first_purchase) pattern. Spend real time here. KPI thinking Every query must answer a business question. Not just 'revenue by month' but 'which month had the highest AOV and why might that be'. A/B test in SQL Rare skill — most DAs can't do this. If you can explain z-score and p-value from first principles, you stand out immediately.