Saturday, November 29, 2025

Prime SQL Patterns from FAANG Knowledge Science Interviews (with Code)


SQL Patterns from FAANG Data Science Interviews
Picture by Creator

 

Introduction

 
The technical screening for information science roles in FAANG corporations could be very thorough. Nevertheless, even they’ll’t give you an infinite stream of distinctive interview questions. When you’ve gone via the grind sufficient instances, you begin to discover that some SQL patterns hold exhibiting up.

Listed here are the highest 5, with examples and code (PostgreSQL) for follow.

 

SQL Patterns from FAANG Data Science InterviewsSQL Patterns from FAANG Data Science Interviews
Picture by Creator | Serviette AI

 

Grasp these and also you’ll be prepared for many SQL interviews.

 

Sample #1: Aggregating Knowledge with GROUP BY

 
Utilizing combination capabilities with GROUP BY permits you to combination metrics throughout classes.

This sample is usually mixed with information filtering, which implies utilizing one of many two clauses:

  • WHERE: Filters information earlier than the aggregation.
  • HAVING: Filters information after the aggregation.

Instance: This Meta interview query asks you to search out the full variety of feedback made 30 or fewer days earlier than 2020-02-10 per consumer. Customers with no feedback ought to be excluded from the output.

We use the SUM() perform with a GROUP BY clause to sum the variety of feedback per consumer. Outputting the feedback solely inside the specified interval is achieved by filtering the info earlier than aggregation, i.e., utilizing WHERE. There’s no must calculate which date is “30 days earlier than 2020-02-10”; we merely subtract 30 days from that date utilizing the INTERVAL date perform.

SELECT user_id,
       SUM(number_of_comments) AS number_of_comments
FROM fb_comments_count
WHERE created_at BETWEEN '2020-02-10'::DATE - 30 * INTERVAL '1 day' AND '2020-02-10'::DATE
GROUP BY user_id;

 

Right here’s the output.
 

user_id number_of_comments
5 1
8 4
9 2
99 2

 

Enterprise Use:

  • Consumer exercise metrics: DAU & MAU, churn charge.
  • Income metrics: income per area/product/time interval.
  • Consumer engagement: common session size, common clicks per consumer.

 

Sample #2: Filtering with Subqueries

 
When utilizing subqueries for filtering, you create an information subset, then filter the principle question towards it.

The 2 predominant subquery varieties are:

  • Scalar subqueries: Return a single worth, e.g., most quantity.
  • Correlated subqueries: Reference and rely on the results of the outer question to return the values.

Instance: This interview query from Meta asks you to create a suggestion system for Fb. For every consumer, you must discover pages that this consumer doesn’t comply with, however at the very least one in every of their buddies does. The output ought to encompass the consumer ID and the ID of the web page that ought to be really useful to this consumer.

The outer question returns all of the user-page pairs the place the web page is adopted by at the very least one good friend.

Then, we use a subquery within the WHERE clause to take away the pages that the consumer already follows. There are two circumstances within the subquery: one which solely considers pages adopted by this particular consumer (checks for this consumer solely), after which checks if the web page thought-about for suggestion is amongst these adopted by the consumer (checks for this web page solely).

Because the subquery returns all of the pages adopted by the consumer, utilizing NOT EXISTS in WHERE excludes all these pages from the advice.

SELECT DISTINCT f.user_id,
                p.page_id
FROM users_friends f
JOIN users_pages p ON f.friend_id = p.user_id
WHERE NOT EXISTS
    (SELECT *
     FROM users_pages pg
     WHERE pg.user_id = f.user_id
       AND pg.page_id = p.page_id);

 

Right here’s the output.
 

user_id page_id
1 23
1 24
1 28
5 25

 

Enterprise Use:

  • Buyer exercise: most up-to-date login per consumer, newest subscription change.
  • Gross sales: highest order per buyer, high income order per area.
  • Product efficiency: most bought product in every class, highest-revenue product monthly.
  • Consumer behaviour: Longest session per consumer, first buy per buyer.
  • Critiques & suggestions: high reviewer, newest assessment for every product.
  • Operations: Newest cargo standing per order, quickest supply time per area.

 

Sample #3: Rating with Window Capabilities

 
Utilizing window capabilities akin to ROW_NUMBER(), RANK(), and DENSE_RANK() permits you to order rows inside information partitions, after which determine the primary, second, or nth report.

Here’s what every of those rating window capabilities does:

  • ROW_NUMBER(): Assigns a singular sequential quantity inside every partition; tied values get totally different row numbers.
  • RANK(): Assigns the identical rank to tied values and skips the following ranks for the following non-tied worth.
  • DENSE_RANK(): Similar as RANK(), solely it doesn’t skip rank after ties.

Instance: In an Amazon interview query, we have to discover the best each day order price between 2019-02-01 and 2019-05-01. If a buyer has a couple of order on a sure day, sum the order prices every day. The output ought to comprise the client’s first identify, the full price of their order(s), and the date of the order.

Within the first frequent desk expression (CTE), we discover the orders between the required dates and sum the client’s each day totals for every date.

Within the second CTE, we use RANK() to rank clients by order price descendingly for every date.

Now, we be a part of two CTEs to output the required columns and filter solely the orders with the primary rank assigned to them, i.e., the best order.

WITH customer_daily_totals AS (
  SELECT o.cust_id,
         o.order_date,
         SUM(o.total_order_cost) AS total_daily_cost
  FROM orders o
  WHERE o.order_date BETWEEN '2019-02-01' AND '2019-05-01'
  GROUP BY o.cust_id, o.order_date
),

ranked_daily_totals AS (
  SELECT cust_id,
         order_date,
         total_daily_cost,
         RANK() OVER (PARTITION BY order_date ORDER BY total_daily_cost DESC) AS rnk
  FROM customer_daily_totals
)

SELECT c.first_name,
       rdt.order_date,
       rdt.total_daily_cost AS max_cost
FROM ranked_daily_totals rdt
JOIN clients c ON rdt.cust_id = c.id
WHERE rdt.rnk = 1
ORDER BY rdt.order_date;

 

Right here’s the output.
 

first_name order_date max_cost
Mia 2019-02-01 100
Farida 2019-03-01 80
Mia 2019-03-01 80
Farida 2019-04-23 120

 

Enterprise Use:

  • Consumer exercise: “Prime 5 most energetic customers final month”.
  • Income: “The second-highest income area”.
  • Product reputation: “Prime 10 best-selling merchandise”.
  • Purchases “The primary buy of every buyer”.

 

Sample #4: Calculating Shifting Averages & Cumulative Sums

 
The transferring (rolling) common calculates the typical during the last N rows, sometimes months or days. It’s calculated utilizing the AVG() window perform and defining the window as ROWS BETWEEN N PRECEDING AND CURRENT ROW.

The cumulative sum (working whole) is the sum from the primary row as much as the present row, which is mirrored in defining the window as ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW within the SUM() window perform.

Instance: The interview query from Amazon needs us to search out the 3-month rolling common of whole income from purchases. We must always output the year-month (YYYY-MM) and the 3-month rolling common, sorted from the earliest to the newest month.

Additionally, the returns (unfavorable buy values) shouldn’t be included.

We use a subquery to calculate month-to-month income utilizing SUM() and convert the acquisition date to a YYYY-MM format with the TO_CHAR() perform.

Then, we use AVG() to calculate the transferring common. Within the OVER() clause, we order the info in partition by month and outline the window as ROWS BETWEEN 2 PRECEDING AND CURRENT ROW; we calculate the 3-month transferring common, which takes into consideration the present and the earlier two months.

SELECT t.month,
       AVG(t.monthly_revenue) OVER(ORDER BY t.month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS avg_revenue
FROM
  (SELECT TO_CHAR(created_at::DATE, 'YYYY-MM') AS month,
          SUM(purchase_amt) AS monthly_revenue
   FROM amazon_purchases
   WHERE purchase_amt > 0
   GROUP BY 1
   ORDER BY 1) AS t
ORDER BY t.month ASC;

 

Right here’s the output.
 

month avg_revenue
2020-01 26292
2020-02 23493.5
2020-03 25535.666666666668
2020-10 21211

 

To calculate a cumulative sum, we’d do it like this.

SELECT t.month,
       SUM(t.monthly_revenue) OVER(ORDER BY t.month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cum_sum
FROM
  (SELECT TO_CHAR(created_at::DATE, 'YYYY-MM') AS month,
          SUM(purchase_amt) AS monthly_revenue
   FROM amazon_purchases
   WHERE purchase_amt > 0
   GROUP BY 1
   ORDER BY 1) AS t
ORDER BY t.month ASC;

 

Right here’s the output.
 

month cum_sum
2020-01 26292
2020-02 46987
2020-03 76607
2020-10 239869

 

Enterprise Use:

  • Engagement metrics: 7-day transferring common of DAU or messages despatched, cumulative cancellations.
  • Monetary KPIs: 30-day transferring common of prices/conversions/inventory costs, income reporting (cumulative YTD).
  • Product efficiency: logins per consumer transferring common, cumulative app installs.
  • Operations: cumulative orders shipped, tickets resolved, bugs closed.

 

Sample #5: Making use of Conditional Aggregations

 
Conditional aggregation helps you to compute a number of segmented metrics in a single go by placing the CASE WHEN assertion inside combination capabilities.

Instance: A query from an Amazon interview asks you to determine returning energetic customers by discovering customers who made a second buy inside 1 to 7 days after their first buy. The output ought to consist solely of those customers’ IDs. The identical-day purchases ought to be ignored.

The primary CTE identifies the customers and the dates of their purchases, excluding same-day purchases through the use of the DISTINCT key phrase.

The second CTE ranks every consumer’s buy dates from the oldest to the most recent.

The final CTE finds the primary and second purchases for every consumer through the use of conditional aggregation. We use MAX() to choose the only non-NULL worth for the primary and second buy dates.

Lastly, we use the results of the final CTE and retain solely customers who made a second buy (non-NULL) inside 7 days of their first buy.

WITH each day AS (
  SELECT DISTINCT user_id,
         created_at::DATE AS purchase_date
  FROM amazon_transactions
),

ranked AS (
  SELECT user_id,
         purchase_date,
         ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY purchase_date) AS rn
  FROM each day
),

first_two AS (
  SELECT user_id,
         MAX(CASE WHEN rn = 1 THEN purchase_date END) AS first_date,
         MAX(CASE WHEN rn = 2 THEN purchase_date END) AS second_date
  FROM ranked
  WHERE rn <= 2
  GROUP BY user_id
)

SELECT user_id
FROM first_two
WHERE second_date IS NOT NULL AND (second_date - first_date) BETWEEN 1 AND 7
ORDER BY user_id;

 

Right here’s the output.
 

user_id
100
103
105
143

 

Enterprise Use:

  • Subscription reporting: paid vs. free customers, energetic vs. churned customers by plan tier.
  • Advertising funnel dashboards: signed up vs. bought customers by site visitors supply, emails opened vs. clicked vs. transformed.
  • E-commerce: accomplished vs. refunded vs. cancelled orders by area, new vs. returning patrons.
  • Product evaluation: iOS vs. Android vs. Net utilization, function adopted vs. not adopted counts per cohort.
  • Finance: income from new vs. present clients, gross vs. internet income.
  • A/B testing & experiments: management vs. therapy metrics.

 

Conclusion

 
If you would like a job at FAANG (and others, too) corporations, give attention to these 5 SQL patterns for the interviews. After all, they’re not the one SQL ideas examined. However they’re mostly examined. By specializing in them, you make sure that your interview preparation is as environment friendly as potential for many SQL interviews at FAANG corporations.
 
 

Nate Rosidi is an information scientist and in product technique. He is additionally an adjunct professor educating analytics, and is the founding father of StrataScratch, a platform serving to information scientists put together for his or her interviews with actual interview questions from high corporations. Nate writes on the newest tendencies within the profession market, offers interview recommendation, shares information science initiatives, and covers every thing SQL.



Related Articles

Latest Articles