Files
pikasTech-unidesk/docs/issue/baidu-netdisk-user-service.md
T
Codex a242e3e3ec feat: expand scheduling, notifications, and queue runtime
- add scheduled task plumbing across backend core, CLI, and frontend surfaces

- add frontend notification UI and keep service pages using the repaired shared stylesheet

- refactor code queue runtime and update baidu netdisk/service integration docs
2026-05-13 08:43:43 +00:00

256 lines
17 KiB
Markdown

# Baidu Netdisk User Service Research
Date: 2026-05-11
Implementation note: the first UniDesk-integrated version now lives in this repo as a main-server private user service, with backend source at `src/components/microservices/baidu-netdisk`, Compose service `baidu-netdisk`, config id `baidu-netdisk`, and frontend page `src/components/frontend/src/baidu-netdisk.tsx`. It keeps the recommended v1 boundary: JSON control API through UniDesk microservice proxy, OAuth Device Code login, PostgreSQL-backed encrypted token/task state, and staging-directory upload/download jobs instead of browser byte streaming.
Environment setup note: Baidu app credentials are account-owned secrets and must be supplied out of band. The local encryption key and exact host configuration steps are documented in `docs/issue/baidu-netdisk-env-setup.md`.
## Goal
Create a UniDesk user service that connects to Baidu Netdisk in a containerized way and exposes file storage operations such as login, browse, upload, download, move, rename and delete. The user-facing login should feel similar to the ClaudeQQ page: the UniDesk frontend shows a login card/QR, backend state, recent transfer jobs and explicit raw JSON buttons, while the business backend remains private behind the UniDesk microservice proxy.
## Recommended Approach
Build a small pure-backend user service named `baidu-netdisk` and integrate it as a UniDesk user service. Use Baidu Netdisk official OAuth/API directly in the backend for the first version; optionally add AList or CLI tools later as transfer workers, not as the primary auth or frontend.
Why this route:
- The official API supports OAuth scopes `basic,netdisk`, QR-like login flows, file listing, metadata/dlink retrieval, multipart upload and file management.
- A ClaudeQQ-like login UI maps well to Baidu's Device Code flow: backend requests a `device_code`, frontend displays `qrcode_url` and `user_code`, backend polls token status at the documented interval.
- The UniDesk proxy currently handles text JSON request/response bodies, with a 1 MiB incoming body limit and an 8 MiB response body limit. Large file bytes should therefore not be pushed through `/api/microservices/*/proxy` in v1.
- Keeping tokens and jobs in PostgreSQL gives restart recovery and avoids storing credentials in local JSON files.
## Important UniDesk Constraint
Current `microservice.http` is suitable for control-plane JSON, not bulk binary file transfer:
- backend-core reads non-GET bodies using `req.text()` and rejects bodies larger than 1 MiB.
- provider-gateway reads upstream responses using `response.text()` and caps returned body text at 8 MiB.
- The proxy only forwards `content-type`, not `range`, `content-disposition` or arbitrary binary headers.
So v1 should expose APIs such as `POST /api/transfers/upload-from-path` and `POST /api/transfers/download-to-path`, where the backend container reads/writes files from a mounted staging directory. If browser-to-local uploads/downloads are required, add a separate binary streaming capability to backend-core/provider-gateway in a future change set. That future gateway change would trigger the provider-gateway version-bump rule.
## Baidu Netdisk Access Model
### Login
Use Device Code as the default container login:
1. `POST /api/auth/device/start` calls `GET https://openapi.baidu.com/oauth/2.0/device/code?response_type=device_code&client_id=<appKey>&scope=basic,netdisk`.
2. Backend stores `device_code`, `user_code`, `verification_url`, `qrcode_url`, `expires_in` and `interval` in PostgreSQL.
3. UniDesk frontend displays the QR code and user code.
4. `GET /api/auth/device/status?sessionId=...` returns current login state; backend polls Baidu token endpoint no more frequently than the returned interval and at least 5 seconds.
5. On success, backend stores `access_token`, `refresh_token`, `expires_in`, `scope`, account metadata and refresh timestamps.
Authorization Code mode is also possible and can pass `qrcode=1`, but it requires a redirect URI and callback endpoint. Device Code is simpler for a private container behind UniDesk because the browser only needs to display a QR URL and poll backend state.
Token handling requirements:
- Access token lifetime is 30 days in the Baidu docs.
- Refresh token is long-lived but the Netdisk docs say it is single-use: after a refresh, store the new refresh token immediately and never retry with the old one in a loop.
- Use a PostgreSQL row lock or advisory lock around refresh to avoid two workers spending the same refresh token concurrently.
- Never log tokens or dlinks; health/status endpoints should only expose redacted auth state.
### Scope and Remote Root
Use `scope=basic,netdisk`. Official docs for download still describe third-party app data under `/apps/<productName>` and visible to users as `/我的应用数据/<productName>`, but the file-list docs also define `dir` as an absolute path defaulting to `/`. On 2026-05-13, the current UniDesk Baidu application and authorized account were tested directly against the official APIs: listing `/`, uploading a tiny temporary file to `/unidesk-root-probe-*.txt`, obtaining its `dlink`, downloading it back, verifying MD5, and deleting it all succeeded with `errno=0`. Therefore UniDesk now defaults `UNIDESK_BAIDU_NETDISK_APP_ROOT` to `/` and treats it as the remote working root. Operators can still set `UNIDESK_BAIDU_NETDISK_APP_ROOT=/apps/<name>` to re-enable an app-folder sandbox.
### Browse and Metadata
Useful official endpoints:
- User info: `GET /rest/2.0/xpan/nas?method=uinfo`.
- Quota: `GET /api/quota`.
- List directory: `GET /rest/2.0/xpan/file?method=list` with `dir`, paging and sort parameters.
- File metadata and download URL: `GET /rest/2.0/xpan/multimedia?method=filemetas&fsids=[...]&dlink=1`.
### Upload
Official multipart upload sequence:
1. Compute full-file MD5 and per-part MD5 list. For normal users, part size is fixed at 4 MiB. Docs list higher part and total file limits for paid membership tiers.
2. `POST /rest/2.0/xpan/file?method=precreate` with `path`, `size`, `isdir=0`, `autoinit=1`, `rtype` and `block_list`.
3. `GET /rest/2.0/pcs/file?method=locateupload&appid=250528&uploadid=...&upload_version=2.0` and choose an HTTPS upload domain from `servers`.
4. `POST https://<upload-domain>/rest/2.0/pcs/superfile2?method=upload&type=tmpfile&path=...&uploadid=...&partseq=N` as multipart form `file=@chunk` for each required part.
5. `POST /rest/2.0/xpan/file?method=create` with the same `path`, `size`, `isdir`, `rtype`, `uploadid` and ordered `block_list`.
For small files, the official single-step upload endpoint can be a convenience path, but the multipart path is enough for all sizes and gives progress/resume semantics.
### Download
Official download sequence:
1. Get `fs_id` from list/search.
2. Request file metadata with `dlink=1`.
3. Fetch `dlink&access_token=<token>` using `User-Agent: pan.baidu.com`.
4. Respect `302` redirects, `Range` for resume, and the documented 8-hour dlink lifetime.
Because browsers cannot safely set the required `User-Agent` and current UniDesk proxy cannot stream large binary responses, the backend should download to staging storage in v1. Browser download can be offered later via a binary streaming proxy endpoint or a backend-owned short-lived internal file endpoint if the gateway/core are upgraded to stream bytes safely.
## Third-Party Technology Options
1. Official API in custom backend (recommended for v1)
- Best fit for UniDesk security and UI conventions.
- Precise control over token rotation, path sandboxing, transfer jobs and PostgreSQL persistence.
- Needs implementation of multipart upload/download resume, but the API flow is straightforward.
2. AList as a sidecar or reference implementation
- AList already has a Baidu Netdisk driver and supports storage mounting through its own server.
- Useful if we want WebDAV-like access or want to validate behavior quickly.
- Treat it as an internal sidecar behind the `baidu-netdisk` backend; do not expose AList WebUI as the UniDesk frontend.
- Watch license/upgrade/security posture before embedding it into production.
3. bypy as a Python worker
- Good for app-folder upload/download automation and quick scripts.
- Can run in a worker container for batch operations if we accept Python dependency and app-folder assumptions.
- Less ideal as the primary service API because UniDesk still needs its own auth state, job model and structured frontend.
4. BaiduPCS-Go as a worker
- Strong CLI for batch transfers and resume behavior.
- Could be invoked from the service for jobs after controlled login/config injection.
- Avoid making CLI config files the credential authority; PostgreSQL should remain authoritative.
5. Unofficial or cracked web APIs
- Avoid. They are unstable, hard to validate, and may violate Baidu terms or trigger account risk controls.
## Proposed User Service Contract
### Backend APIs
Expose a pure JSON control API first:
- `GET /health`: service, storage, auth, queue and Baidu API reachability summary.
- `GET /api/auth/status`: redacted configured/logged-in/auth-session summary.
- `POST /api/auth/device/start`: start QR/device login.
- `GET /api/auth/device/status?sessionId=...`: login state and QR metadata. OAuth `authorization_pending` and `slow_down` responses are normal pending states and must not be surfaced as frontend HTTP errors.
- `POST /api/auth/refresh`: force token refresh for diagnostics.
- `POST /api/auth/logout`: revoke local tokens and stop jobs.
- `GET /api/account`: user info and quota.
- `GET /api/files?dir=/&start=0&limit=100`: directory listing under the configured remote working root.
- `GET /api/files/meta?fsids=...&dlink=0|1`: metadata, optionally dlink redacted by default.
- `POST /api/folders`: create folder through `method=create&isdir=1`.
- `POST /api/files/manage`: copy/move/rename/delete using `method=filemanager`.
- `POST /api/transfers/upload-from-path`: read a file inside the mounted staging directory and upload it to Baidu.
- `POST /api/transfers/download-to-path`: download a Baidu file to the staging directory.
- `POST /api/self-test`: create a tiny staging fixture, upload it, verify it appears in `/api/files`, download it back to staging and compare MD5.
- `GET /api/transfers`: list transfer jobs.
- `GET /api/transfers/{id}`: job detail, progress, retry and last error.
- `POST /api/transfers/{id}/cancel` and `POST /api/transfers/{id}/retry`.
- `GET /logs`: recent structured service logs with tokens/dlinks redacted.
If a future binary proxy is added, extend with:
- `POST /api/uploads/sessions` + chunk PUT/POST endpoints.
- `GET /api/downloads/{jobId}/stream` with Range support.
### PostgreSQL Tables
Minimum schema:
- `baidu_netdisk_accounts(id, baidu_uid, username, avatar_url, vip_type, root_path, created_at, updated_at)`.
- `baidu_netdisk_tokens(account_id, access_token_ciphertext, refresh_token_ciphertext, expires_at, scope, generation, last_refresh_at)`.
- `baidu_netdisk_auth_sessions(id, device_code_ciphertext, user_code, verification_url, qrcode_url, expires_at, poll_interval_seconds, status, error, created_at, updated_at)`.
- `baidu_netdisk_transfer_jobs(id, account_id, direction, status, local_path, remote_path, fs_id, size_bytes, bytes_done, part_size, block_list_json, uploadid, retry_count, error, created_at, updated_at)`.
- `baidu_netdisk_transfer_events(id, job_id, level, message, data_json, created_at)`.
Token encryption key should come from an environment variable such as `BAIDU_NETDISK_TOKEN_KEY`; no secrets should be committed.
### Container and Deployment
If deployed on D601, use the normal compute-node user-service boundary:
```json
{
"id": "baidu-netdisk",
"name": "Baidu Netdisk",
"providerId": "D601",
"description": "Containerized Baidu Netdisk storage gateway with QR/device login and transfer jobs.",
"repository": {
"url": "https://github.com/pikasTech/baidu-netdisk-unidesk",
"commitId": "<commit>",
"dockerfile": "Dockerfile",
"composeFile": "docker-compose.unidesk.yml",
"composeService": "baidu-netdisk",
"containerName": "baidu-netdisk-backend"
},
"backend": {
"nodeBaseUrl": "http://host.docker.internal:3295",
"nodeBindHost": "127.0.0.1",
"nodePort": 3295,
"proxyMode": "provider-gateway-http",
"frontendOnly": true,
"public": false,
"allowedMethods": ["GET", "HEAD", "POST", "DELETE"],
"allowedPathPrefixes": ["/health", "/logs", "/api/"],
"healthPath": "/health",
"timeoutMs": 30000
},
"development": {
"providerId": "D601",
"sshPassthrough": true,
"worktreePath": "/home/ubuntu/baidu-netdisk-unidesk"
},
"frontend": {
"route": "/apps/baidu-netdisk",
"integrated": true
}
}
```
If deployed on main server, use a Compose service name such as `http://baidu-netdisk:4244` and add a root `docker-compose.yml` service. Main-server deployment is justified only if UniDesk needs central storage on the main server; otherwise D601 or another compute node is cleaner.
### Frontend Page
Add a dedicated `src/components/frontend/src/baidu-netdisk.tsx` page and route tab:
- Login panel: QR image from `qrcode_url`, user code, expires timer, poll status, refresh QR and logout buttons.
- Account cards: username, UID, quota used/total, VIP state, remote working root path.
- File browser: breadcrumb rooted at the configured working root, now `/` by default, paginated table, folder creation, rename/delete/move controls.
- Transfer panel: upload-from-path form, download-to-path form, job rows, progress bars, speed, ETA, retry/cancel buttons.
- Safety text: private backend mapping, token storage redacted, no direct public Baidu token exposure.
- Raw JSON only behind explicit buttons, following existing user-service conventions.
## Acceptance Plan
Focused checks after implementation:
- `bun scripts/cli.ts microservice list` shows `baidu-netdisk`, private backend, target provider and container summary.
- `bun scripts/cli.ts microservice health baidu-netdisk` returns `ok=true`, `service=baidu-netdisk`, `storage=postgres` and redacted auth state.
- `bun scripts/cli.ts microservice proxy baidu-netdisk /api/auth/device/start --method POST` returns a login session with QR/user-code metadata but no token.
- After manual QR authorization, `/api/account` and `/api/files?dir=<root>` return user/quota/list data.
- Upload/download tests can use `bun scripts/cli.ts microservice proxy baidu-netdisk /api/self-test --method POST --raw`; the response must include a remote path in the working root, an `fsId`, succeeded upload/download jobs, and matching `expectedMd5`/`downloadedMd5`.
- Public port probes must fail for the service port; frontend access only through UniDesk.
- Playwright verifies `/app/baidu-netdisk/` renders shell, login card, account/quota, file browser, transfer panel and no naked JSON.
Full regression should later add `microservice:catalog-baidu-netdisk`, `microservice:baidu-netdisk-health`, login-state checks and `frontend:baidu-netdisk-integrated-visible` to `scripts/src/e2e.ts`.
## Open Risks
- Baidu app review/permissions may block upload/download until the app is approved and scoped correctly.
- Device Code QR expires quickly; frontend needs clear countdown and refresh behavior.
- Refresh token is single-use per Netdisk docs; a race can force re-login if token rotation is not serialized.
- Browser direct download is not a v1 fit because official dlink download requires `User-Agent: pan.baidu.com` and current UniDesk proxy cannot stream large binary safely.
- Large upload/download jobs need resumable local job records and cleanup of temporary chunks/staged files.
- Using AList/BaiduPCS-Go/byPy may introduce third-party license and maintenance risk; keep their configs/token caches derived from UniDesk PostgreSQL, not authoritative.
## Sources
- Baidu Netdisk Open Platform overview: https://pan.baidu.com/union/doc/nksg0sbfs
- Authorization Code mode: https://pan.baidu.com/union/doc/al0rwqzzl
- Device Code mode: https://pan.baidu.com/union/doc/fl1x114ti
- User info: https://pan.baidu.com/union/doc/pksg0s9ns
- Quota: https://pan.baidu.com/union/doc/Cksg0s9ic
- File list: https://pan.baidu.com/union/doc/nksg0sat9
- File metadata/dlink: https://pan.baidu.com/union/doc/Fksg0sbcm
- Precreate upload: https://pan.baidu.com/union/doc/3ksg0s9r7
- Multipart upload: https://pan.baidu.com/union/doc/nksg0s9vi
- Create file/folder: https://pan.baidu.com/union/doc/rksg0sa17
- Single-step upload: https://pan.baidu.com/union/doc/olkuuy5kz
- Locate upload domain: https://pan.baidu.com/union/doc/Mlvw5hfnr
- Download: https://pan.baidu.com/union/doc/pkuo3snyp
- File manager: https://pan.baidu.com/union/doc/mksg0s9l4
- AList Baidu Netdisk driver docs: https://alist-repo.github.io/docs/guide/drivers/baidu.html
- bypy project page: https://pypi.org/project/bypy/
- BaiduPCS-Go repository: https://github.com/qjfoidnh/BaiduPCS-Go