【编程学习】数据分析岗常见SQL面试题

6Young2024-03-192024-12-23

不点任何东西的顾客

Customers 表：

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| id          | int     |
| name        | varchar |
+-------------+---------+
在 SQL 中，id 是该表的主键。
该表的每一行都表示客户的 ID 和名称。

Orders 表：

+-------------+------+
| Column Name | Type |
+-------------+------+
| id          | int  |
| customerId  | int  |
+-------------+------+
在 SQL 中，id 是该表的主键。
customerId 是 Customers 表中 ID 的外键( Pandas 中的连接键)。
该表的每一行都表示订单的 ID 和订购该订单的客户的 ID。

找出所有从不点任何东西的顾客。

以 任意顺序 返回结果表。

select name Customers
from Customers
where Customers.id not in (
    select customerId from Orders
)

select name Customers
from Customers left join Orders
    on Customers.id = Orders.customerId
where Orders.customerId is Null

第二高的薪水

Employee 表：

+-------------+------+
| Column Name | Type |
+-------------+------+
| id          | int  |
| salary      | int  |
+-------------+------+
在 SQL 中，id 是这个表的主键。
表的每一行包含员工的工资信息。

查询并返回 Employee 表中第二高的薪水。如果不存在第二高的薪水，查询应该返回 null(Pandas 则返回 None) 。

select ifnull(
    (select max(distinct salary) SecondHighestSalary
    from Employee
    where salary < (
        select max(distinct salary)
        from Employee
    )), null) as SecondHighestSalary;

select ifnull(
    (select distinct salary SecondHighestSalary
    from Employee
    order by salary desc
    limit 1, 1)
,null) as SecondHighestSalary;

分数排名

Scores表:

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| id          | int     |
| score       | decimal |
+-------------+---------+
在 SQL 中，id 是该表的主键。
该表的每一行都包含了一场比赛的分数。Score 是一个有两位小数点的浮点值。

查询并对分数进行排序。排名按以下规则计算:

分数应按从高到低排列。
如果两个分数相等，那么两个分数的排名应该相同。
在排名相同的分数后，排名数应该是下一个连续的整数。换句话说，排名之间不应该有空缺的数字。

按 score 降序返回结果表。

1 2	select score, dense_rank() over(order by score desc) as 'rank' from Scores;

其他：

select score, rank() over(order by score desc) as 'rank'
from Scores;

-- | score | rank |
-- | ----- | ---- |
-- | 4     | 1    |
-- | 4     | 1    |
-- | 3.85  | 3    |
-- | 3.65  | 4    |
-- | 3.65  | 4    |
-- | 3.5   | 6    |

select score, row_number() over(order by score desc) as 'rank'
from Scores;

-- | score | rank |
-- | ----- | ---- |
-- | 4     | 1    |
-- | 4     | 2    |
-- | 3.85  | 3    |
-- | 3.65  | 4    |
-- | 3.65  | 5    |
-- | 3.5   | 6    |

连续出现的数字

表：Logs

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| id          | int     |
| num         | varchar |
+-------------+---------+
在 SQL 中，id 是该表的主键。
id 是一个自增列。

找出所有至少连续出现三次的数字。

返回的结果表中的数据可以按 任意顺序 排列。

select distinct a.num ConsecutiveNums
from Logs a, Logs b, Logs c
where a.id = b.id+1 and b.id = c.id+1 and a.num=b.num and a.num=c.num

select distinct num ConsecutiveNums  from (
    select num, row_number() over(order by id) - row_number() over(partition by num order by id) as rk
    from Logs
) tmp_table group by rk, num
having count(*) >= 3

指标计算

-- 计算日活人数
select login_date, count(distinct user_id) DAU
from login_table
group by login_date;

-- 计算次日留存率
select a.login_date, count(distinct b.login_date)/count(distinct a.login_date) 次日留存率
from login_table a left join login_table b
on datediff(a.login_date, b.login_date) = -1 and a.user_id = b.user_id
group by a.login_date

-- 考虑新注册的留存率
select distinct reg_date, count(login_date)/count(reg_date)
from (
    select distinct user_id, a.login_date reg_date
    from login_table a left join login_table b
    on b.login_date < a.login_date
    where b.login_date is Null
) reg_table left join login_table
on datediff(reg_table.reg_date, login_table.login_date) = -1 and reg_table.user_id = login_table.user_id

SQL行转列

现有一张表student_score，其中有三列student_id、subject和 score。

	student_id	subject	score
1	1001	语文	88
2	1001	数学	90
3	1001	英语	86
4	1002	语文	99
5	1002	数学	74
6	1002	英语	86

现在，我们想将其转变为：

	student_id	语文	数学	英语
1	1001	88	90	86
2	1002	99	74	86

select * from student_score
pivot (
    sum (score)
    for subject in ([语文],[数学],[英语])
) as score_per_subject