洛谷P3975 [TJOI2015] 弦论题解

题目链接：P3975 [TJOI2015] 弦论

题意：

为了提高智商，cxy 开始学习弦论。

这一天，她在《String theory》中看到了这样一道问题：

对于一个给定的长度为 \(n\) 的字符串，求出它的第 \(k\) 小子串是什么。你能帮帮她吗?

输入格式:

第一行是一个仅由小写英文字母构成的字符串 \(s\)。

第二行为两个整数 \(t\) 和 \(k\) ，

\(t\) 为 \(0\) 则表示不同位置的相同子串算作一个。

\(t\) 为 \(1\) 则表示不同位置的相同子串算作多个。

\(k\) 的意义见题目描述。

输出格式：

输出数据仅有一行，该行有一个字符串，为第 \(k\) 小的子串。若子串数目不足 \(k\) 个，则输出 \(-1\)。

数据范围：

\(1\leq n \leq 5 \times 10^5,~0\leq t \leq 1,~1\leq k \leq 10^9\)。

\(\mathcal{Part}\ 0\)

看题解区全是 SAM 的解法，少有的 SA 解法也都是 \(\mathcal{O}(n \log n)\) 的

那么我就来讲一种理论复杂度 \(\mathcal{O}(n)\) 的 SA 解法吧，思路来源于参考文献1。

\(\mathcal{Part}\ 1\)

第一问~~非常简单~~，就是 P2408 不同子串个数，由于每个排名为 \(i\) 的后缀贡献为 \[ n - \operatorname{sa}(i) + 1 - \operatorname{height}(i) \] 所以我们只需要像平衡树找 \(k\) 小值一样，每次如果大于就减去贡献，否则就是当前串。

这一部分的代码：

for(int i = 1; i <= n; i++)
{
    if(n - sa[i] - height[i] + 1 < id) { id -= n - sa[i] - height[i] + 1; }
    else {
        for(int j = 0; j < height[i] + id; j++) cout << s[sa[i] + j];
        cout << '\n'; exit(0); // 这段代码真的好难看（大雾）
    }
}
cout << "-1\n";

\(\mathcal{Part}\ 2\)

举个例子，\(S = \texttt{aabbababb}\) ，它的后缀数组如下：（即 \(\rm sa\) 数组） \[ \begin{array}{|l|l|l|l|l|l|l|l|l|} \hline \text { 1 } & \text { 2 } & \text { 3 } & \text { 4 } & \text { 5 } & \text { 6 } & \text { 7 } & \text { 8 } & \text { 9 } \\ \hline \text { b } & & & & & & & & \\ \hline \text { b } & & & \text { b } & & & & & \\ \hline \text { a } & & & \text { b } & & & & & \text { b } \\ \hline \text { b } & & & \text { a } & & \text { b } & & & \text { b } \\ \hline \text { a } & \text { b } & & \text { b } & & \text { b } & & & \text { a } \\ \hline \text { b } & \text { b } & & \text { a } & & \text { a } & \text { b } & & \text { b } \\ \hline \text { b } & \text { a } & \text { b } & \text { b } & & \text { b } & \text { b } & & \text { a } \\ \hline \text { a } & \text { b } & \text { b } & \text { b } & & \text { a } & \text { a } & \text { b } & \text { b } \\ \hline \text { a } & \text { a } & \text { a } & \text { a } & \text { b } & \text { b } & \text { b } & \text { b } & \text { b } \\ \hline \end{array} \] 有没有发现什么神奇的地方？观察每一行的 \(\tt a\) 和 \(\tt b\) ，他们实际上将整个后缀数组构成了一棵树。

这棵树有一些有趣的性质

每个节点以区间 \([l,r]\) 中第一个 height 值最小的位置为分界点（证明显然）。
由性质一对任何节点均满足可知，这棵树的权值满足小根堆性质。
对这棵树的先序遍历，那么访问到的节点所代表的公共前缀是按照字典序排序的。
每个节点可以产生 \((r-l+1)\times(\operatorname{LCP}(l,r) - \operatorname{LCP}(fa(l),\,fa(r)))\) 个不同的子串。
由性质四和性质一可知子串个数就是 \((r-l+1)\times(\operatorname{height}(u) - \operatorname{height}(fa(u)))\) 。

实际上我们可以将这棵树进一步缩减成一棵笛卡尔树。

其中根节点表示区间 \([2,n]\) 的第一个 height 最小值的位置（编号 \(1\) 的 height 没有意义）

其余的节点，如果是左儿子则表示区间 \([fa(l),\, x-1]\) ，如果是右儿子则表示区间 \([x,fa(r)]\) 。

那么每次询问就可以以先序遍历的方式访问节点，减去每个节点的贡献，直到减不动就输出答案。

查询部分的代码：

void solve(int p, int l, int r, int last)
{
    if(l >= r) // 走到不存在的节点
    {
        if(id > n - sa[r] + 1 - last) id -= n - sa[r] + 1 - last;
        else
        {
            for(int j = 0; j < id + last; j++) cout << s[sa[r] + j];
            cout << '\n'; exit(0);
        }
        return;
    }
    if(id > (height[p] - last) * (r - l + 1)) {
        id -= (height[p] - last) * (r - l + 1);
    }else
    {
        for(int j = 0; j < last; j++) cout << s[sa[r] + j];
        for(int j = last; id > 0; id -= (r - l + 1), j++) cout << s[sa[r] + j];
        cout << '\n'; exit(0);
    }
    solve(ls[p], l, p - 1, height[p]); solve(rs[p], p, r, height[p]); // 先序遍历
}

\(\mathcal{Part}\ 3\)

~~嗯，感觉这部分不算 Part 2 ，不如就叫 Part 3 吧。~~

时间复杂度 \(\mathcal{O}(n \log n)\) 或 \(\mathcal{O}(n)\) ，取决于建后缀数组 SA 的方式

代码：（~~好懒啊不想写 DC3，而且我也不会~~）

#include <bits/stdc++.h>
using namespace std;
// #define int long long
#define INF 0x3f3f3f3f
typedef long long ll;
void up(int &x, int y) { x < y ? x = y : 0; }
void down(int &x, int y) { x > y ? x = y : 0; }
#define N ((int)(5e5 + 55))
#define mem(a) memset(a, 0, sizeof(a))
#define check(i, w) (tmp[sa[i]] == tmp[sa[i - 1]] && tmp[sa[i] + w] == tmp[sa[i - 1] + w])

char s[N];
int id, type, top, stk[N], ls[N], rs[N];
int n, sa[N], rk[N * 2], tmp[N * 2], height[N], cnt[N];
void sort(const int m, int w)
{
    memset(cnt, 0, sizeof(int) * (m + 5));
    for(int i = 1; i <= n; i++) tmp[i] = sa[i];
    for(int i = 1; i <= n; i++) ++cnt[rk[tmp[i] + w]];
    for(int i = 1; i <= m; i++) cnt[i] += cnt[i - 1];
    for(int i = n; i; i--) sa[cnt[rk[tmp[i] + w]]--] = tmp[i];
}
void getheight()
{
    int k = 0;
    for(int i = 1; i <= n; height[rk[i]] = k, i++)
    {
        const int j = sa[rk[i] - 1]; if(k) --k;
        while(i + k <= n && j + k <= n && s[i + k] == s[j + k]) ++k;
    }
}
void getsa()
{
    const int m = max(n, 233);
    for(int i = 1; i <= n; i++) { sa[i] = i, rk[i] = s[i]; }
    for(int w = 1; w < n; w *= 2)
    {
        sort(m, w); sort(m, 0);
        for(int i = 1; i <= n; i++) tmp[i] = rk[i];
        for(int i = 1, p = 0; i <= n; i++)
            if(check(i, w)) rk[sa[i]] = p; else rk[sa[i]] = ++p;
    }
}
void solve(int p, int l, int r, int last)
{
    // cout << l << ' ' << r << '\n';
    if(l >= r)
    {
        if(id > n - sa[r] + 1 - last) id -= n - sa[r] + 1 - last;
        else
        {
            for(int j = 0; j < id + last; j++) cout << s[sa[r] + j];
            cout << '\n'; exit(0);
        }
        return;
    }
    if(id > (height[p] - last) * (r - l + 1)) {
        id -= (height[p] - last) * (r - l + 1);
    }else
    {
        for(int j = 0; j < last; j++) cout << s[sa[r] + j];
        for(int j = last; id > 0; id -= (r - l + 1), j++) cout << s[sa[r] + j];
        cout << '\n'; exit(0);
    }
    solve(ls[p], l, p - 1, height[p]); solve(rs[p], p, r, height[p]);
}
signed main()
{
    ios::sync_with_stdio(0);
    cin.tie(0); cout.tie(0);
    // freopen("check.in","r",stdin);
    // freopen("check.out","w",stdout);
    cin >> (s + 1) >> type >> id;
    n = strlen(s + 1); getsa(); getheight();
    if(type == 0)
    {
        for(int i = 1; i <= n; i++)
        {
            if(n - sa[i] - height[i] + 1 < id) { id -= n - sa[i] - height[i] + 1; }
            else {
                for(int j = 0; j < height[i] + id; j++) cout << s[sa[i] + j];
                cout << '\n'; exit(0);
            }
        }
        cout << "-1\n";
    }else
    {
        int mn = INF, rt = 0;
        for(int i = 2; i <= n; i++)
        {
            int last = top;
            while(top > 0 && height[stk[top]] > height[i]) --top;
            if(top > 0) rs[stk[top]] = i;
            if(top < last) ls[i] = stk[top + 1];
            stk[++top] = i; if(height[i] < mn) { mn = height[i]; rt = i; }
        }
        solve(rt, 1, n, 0); cout << "-1\n";
    }
    return 0;
}