Add research catalog CLI and training plan
This commit is contained in:
34
docs/PDF_WORKFLOW.md
Normal file
34
docs/PDF_WORKFLOW.md
Normal file
@@ -0,0 +1,34 @@
|
||||
# PDF Workflow
|
||||
|
||||
Goal: keep research PDFs in a known place, catalog them, and open them fast.
|
||||
|
||||
## Defaults
|
||||
- Research root: `~/Documents/Research`
|
||||
- Catalog output: `~/src/context/index/research_catalog.json`
|
||||
|
||||
## Commands
|
||||
```sh
|
||||
python -m afs_scawful research catalog
|
||||
python -m afs_scawful research list
|
||||
python -m afs_scawful research show 2512-20957v2-XXXXXXXX
|
||||
python -m afs_scawful research open 2512-20957v2-XXXXXXXX --open
|
||||
```
|
||||
|
||||
## Overrides
|
||||
- `AFS_RESEARCH_ROOT=/path/to/Research`
|
||||
- `AFS_RESEARCH_CATALOG=/path/to/research_catalog.json`
|
||||
- Optional config: `research_paths.toml` in `~/.config/afs/afs_scawful/` or
|
||||
`~/.config/afs/plugins/afs_scawful/config/`
|
||||
|
||||
Example `research_paths.toml`:
|
||||
```toml
|
||||
[paths]
|
||||
research_root = "~/Documents/Research"
|
||||
research_catalog = "~/src/context/index/research_catalog.json"
|
||||
```
|
||||
|
||||
## Notes
|
||||
- Abstract excerpts are auto-extracted from the first pages; verify before quoting.
|
||||
- `--open` uses the OS default PDF viewer (Preview on macOS).
|
||||
- For richer metadata extraction, install the optional dependency:
|
||||
`pip install -e '.[research]'`
|
||||
@@ -1,7 +1,7 @@
|
||||
# STATUS
|
||||
|
||||
Stage: Prototype
|
||||
Now: config helpers; dataset registry builder; resource indexer; training sample model; validator base + initial validators; doc-section generator; pytest coverage.
|
||||
Now: config helpers; dataset registry builder; resource indexer; training sample model; validator base + initial validators; doc-section generator; research catalog CLI + PDF workflow docs; pytest coverage.
|
||||
Not yet: more generators; training runner; dataset QA reports.
|
||||
Next: add generator QA summary + manifest; wire generator outputs into AFS Studio.
|
||||
Issues: no training runtime yet.
|
||||
|
||||
48
docs/TRAINING_PLAN.md
Normal file
48
docs/TRAINING_PLAN.md
Normal file
@@ -0,0 +1,48 @@
|
||||
# Training Plan (AFS Scawful)
|
||||
|
||||
Scope: local-only training data pipelines and evaluation for AFS workflows.
|
||||
Research-only. See `../afs/docs/RESEARCH_SOURCES.md` for citations.
|
||||
|
||||
## Goals
|
||||
- Keep datasets reproducible, small, and auditable.
|
||||
- Prioritize agentic filesystem primitives before model training complexity.
|
||||
- Use evaluation loops to avoid training on noise.
|
||||
|
||||
## Phase 0 — Inventory + Research Catalog (now)
|
||||
- Use `afs_scawful research catalog` to index `~/Documents/Research`.
|
||||
- Keep the catalog JSON in `~/src/context/index/research_catalog.json`.
|
||||
- Verify metadata/abstract excerpts before quoting. [R1]
|
||||
|
||||
## Phase 1 — Dataset QA (near-term)
|
||||
- Expand dataset registry with QA summaries (counts, schema drift, invalid rows).
|
||||
- Define a minimal JSON schema for training samples.
|
||||
- Track provenance per dataset and per generator. [R1]
|
||||
|
||||
## Phase 2 — Task Design (near-term)
|
||||
- Start with repo-level navigation tasks that assume a small tool surface. [R3]
|
||||
- Keep tasks focused on file discovery, symbol lookup, and context assembly.
|
||||
- Use small, deterministic datasets to validate task framing before scaling.
|
||||
|
||||
## Phase 3 — Context Packaging (mid-term)
|
||||
- Treat training samples as explicit context pipelines with clear state and error
|
||||
propagation. [R4]
|
||||
- Build a minimal "context transcript" format (inputs, tool calls, outputs).
|
||||
|
||||
## Phase 4 — Evaluation (mid-term)
|
||||
- Add human+agent evaluation metrics to avoid overfitting to synthetic tasks. [R7]
|
||||
- Include tone-variant prompts as a controlled ablation (optional). [R6]
|
||||
|
||||
## Phase 5 — Efficiency References (later)
|
||||
- Use MoE efficiency papers only when scaling becomes a bottleneck. [R5]
|
||||
|
||||
## Unknown / needs verification
|
||||
- Which tasks best reflect AFS workflows (agentic filesystem vs orchestration).
|
||||
- Whether RL is needed or if supervised data is sufficient for early stages.
|
||||
|
||||
## Citations
|
||||
- [R1] `../afs/docs/RESEARCH_SOURCES.md`
|
||||
- [R3] `../afs/docs/RESEARCH_SOURCES.md`
|
||||
- [R4] `../afs/docs/RESEARCH_SOURCES.md`
|
||||
- [R5] `../afs/docs/RESEARCH_SOURCES.md`
|
||||
- [R6] `../afs/docs/RESEARCH_SOURCES.md`
|
||||
- [R7] `../afs/docs/RESEARCH_SOURCES.md`
|
||||
@@ -5,6 +5,7 @@ Scope: AFS Scawful training data pipelines and monitoring. Research-only.
|
||||
## Committed (exists now)
|
||||
- Dataset registry indexing (local)
|
||||
- Resource indexing (local)
|
||||
- Research PDF catalog (local)
|
||||
- Plugin config loader for training paths/resources
|
||||
- Validator base + initial validators (ASM/C++/KG/ASAR)
|
||||
- Generator base + doc-section generator
|
||||
|
||||
Reference in New Issue
Block a user