[Data Liberation] Expose experimental Markdown importer in the importWxr step#2080
Closed
adamziel wants to merge 13 commits into
Closed
[Data Liberation] Expose experimental Markdown importer in the importWxr step#2080adamziel wants to merge 13 commits into
adamziel wants to merge 13 commits into
Conversation
91 tasks
Collaborator
Author
|
This PR needs to be split into smaller parts before merging. For sure the new vendor libraries will become a separate PR. Epub and HTML importers probably, too. |
adamziel
added a commit
that referenced
this pull request
Dec 17, 2024
Adds a forked version of the markdown parsing libraries required by the upcoming Markdown importer. We need out own fork for PHP 7.2 compatibility. The downgrade process was performed semi-automatically via Rector. This PR adds the following libraries: * `league/commonmark` * `webuni/front-matter` There are no testing steps here. This PR only adds new code without modifying the existing one. A part of #2080
adamziel
added a commit
that referenced
this pull request
Dec 17, 2024
Adds a forked version of the markdown parsing libraries required by the upcoming Markdown importer. We need out own fork for PHP 7.2 compatibility. The downgrade process was performed semi-automatically via Rector. This PR adds the following libraries: * `league/commonmark` * `webuni/front-matter` There are no testing steps here. This PR only adds new code without modifying the existing one. A part of: * #2080 * #1894
…Wxr step
🚧 Work in progress, don't merge 🚧
Enables importing markdown files via the `importWxr` step (to be
renamed) when the data-liberation importer is enabled.
Here's the Blueprint you can use to import the "data basics" tutorial
from the Gutenberg repo:
```json
{
"$schema": "https://playground.wordpress.net/blueprint-schema.json",
"landingPage": "/adding-a-delete-button/",
"features": {
"networking": true
},
"steps": [
{
"step": "resetData"
},
{
"step": "importWxr",
"importer": "data-liberation",
"phpImporterOptions": {
"data_source": "markdown_directory",
"source_site_url": "https://raw.githubusercontent.com/WordPress/gutenberg/HEAD/docs/how-to-guides/data-basics"
},
"importData": {
"resource": "git:directory",
"url": "https://github.com/WordPress/gutenberg.git",
"ref": "HEAD",
"path": "docs/how-to-guides/data-basics"
}
}
]
}
```
## Remaining work
* Confirm the WXR import still works both for the regular importer and
the data liberation one
* Add E2E coverage
* Rewrite relative markdown URLs
* Enable specifying additional URL mappings directly in the Blueprint
* Review the code and make any architectural adjustments necessary
…zed WP_Markdown_Directory_Tree_Reader
adamziel
added a commit
that referenced
this pull request
Dec 17, 2024
Builds data-liberation-markdown.phar.gz (200KB) to enable downloading the Markdown importer only when needed instead of on every page load. A part of: * #2080 * #1894 ## Testing instructions Run `nx build playground-data-liberation-markdown`, confirm it finished without errors. A smoke test of the built phar file is included in the build command.
adamziel
added a commit
that referenced
this pull request
Dec 17, 2024
Builds data-liberation-markdown.phar.gz (200KB) to enable downloading the Markdown importer only when needed instead of on every page load. A part of: * #2080 * #1894 ## Testing instructions Run `nx build playground-data-liberation-markdown`, confirm it finished without errors. A smoke test of the built phar file is included in the build command.
f522d40 to
4a31689
Compare
Collaborator
Author
|
I'm going to close this PR. I've reorganized it as a series of smaller ones that we can discuss granularly:
After all the API changes, I'm no longer sure setting up the importer in |
zaerl
pushed a commit
that referenced
this pull request
Jan 8, 2025
Sets the stage for the EPub importer. A part of #2080 Refactors and clean up the Data Liberation package. This includes renaming, reorganizing file paths, improving class structure, and removing deprecated/unused code. ## Key Changes **Refactor:** - Renamed `WP_WXR_Reader` to `WP_WXR_Entity_Reader` for consistency and clarity. - Adjusted references in related classes, tests, and imports. - Moved `byte-readers` to the Blueprints library (see WordPress/php-toolkit#121) **Cleanup:** - Deleted unused and redundant byte reader classes (`WP_Byte_Reader`, `WP_File_Reader`, etc.). - Removed legacy files such as `WXR_Import_Info`. **New Additions:** - Added `WP_Directory_Tree_Entity_Reader` to improve handling of directory tree imports. - Introduced `WP_Import_HTML_Processor` for better HTML import functionality. ## Testing instructions Confirm the CI tests passed
adamziel
added a commit
that referenced
this pull request
Jan 9, 2025
Builds data-liberation-markdown.phar.gz (200KB) to enable downloading the Markdown importer only when needed instead of on every page load. A part of: * #2080 * #1894 ## Testing instructions Run `nx build playground-data-liberation-markdown`, confirm it finished without errors. A smoke test of the built phar file is included in the build command.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🚧 Work in progress, don't merge 🚧
Enables importing markdown and epub files via the
importWxrstep (to be renamed) when the data-liberation importer is enabled.CleanShot.2024-12-13.at.21.17.10.mp4
Here's the Blueprint you can use to import the "data basics" tutorial from the Gutenberg repo:
{ "$schema": "https://playground.wordpress.net/blueprint-schema.json", "landingPage": "/adding-a-delete-button/", "features": { "networking": true }, "steps": [ { "step": "resetData" }, { "step": "importWxr", "importer": "data-liberation", "phpImporterOptions": { "data_source": "markdown_directory", "source_site_url": "https://raw.githubusercontent.com/WordPress/gutenberg/HEAD/docs/how-to-guides/data-basics" }, "importData": { "resource": "git:directory", "url": "https://github.com/WordPress/gutenberg.git", "ref": "HEAD", "path": "docs/how-to-guides/data-basics" } } ] }Requires WordPress/php-toolkit#121
Other code examples
Combining the new importers APIs is getting ridiculous. Here’s two entity readers:
We can mix&match data sources (local filesystem, remote), formats (e.g. md, xhtml, wxr), and containes (plain, .zip, git in the future)
Remaining work