DTP


 
Lively discussions on the graphic arts and publishing — in print or on the web


Go Back   Desktop Publishing Forum > General Discussions > Web Site Building & Maintenance

Reply
 
Thread Tools Display Modes
Old 12-01-2020, 10:50 AM   #1
Kayza
Member
 
Join Date: Jun 2010
Posts: 376
Default Export from WordPress

I have a wordpress site that needs to be totally revamped. The content is OK, it's just everything else that needs to be redone.

I'd like to pull the text out - I don't need formatting and I definitely want there to NOT be the HTML tags. Is there any way to do this?
Kayza is offline   Reply With Quote
Old 12-01-2020, 01:22 PM   #2
terrie
Sysop
 
Join Date: Oct 2004
Posts: 10,478
Default

I've never gotten around to doing anything with WP but I think there are a couple of members who have used it (still use it?). Until they arrive with their words of wisdom, I found a few links that might prove useful--in no particular order:


1. WP's own support info on exporting...


2. 3rd party info that looks pretty comprehensive...



3. Another 3rd party site that looks useful...




Keep us posted on how it's going.




Terrie
terrie is offline   Reply With Quote
Old 12-01-2020, 02:43 PM   #3
Kayza
Member
 
Join Date: Jun 2010
Posts: 376
Default

Thanks!


I'll give a look at these.
Kayza is offline   Reply With Quote
Old 12-01-2020, 03:00 PM   #4
Kayza
Member
 
Join Date: Jun 2010
Posts: 376
Default

So, this is not really what I want. This will give me the XML and let me migrate the whole site over. But I don't want to do that. I really only want to pull the actual text. (I already have backups of the site.)
Kayza is offline   Reply With Quote
Old 12-02-2020, 02:43 AM   #5
Barrie Greed
Member
 
Join Date: May 2006
Location: Stringston, Somerset,UK
Posts: 236
Default

Quote:
Originally Posted by Kayza View Post
So, this is not really what I want. This will give me the XML and let me migrate the whole site over. But I don't want to do that. I really only want to pull the actual text. (I already have backups of the site.)

Sounds to me that you need to be using regular expressions to get this done.


Try searching Google with regex search to remove html tags which should throw up some options. You will need some basic scripting ability using something like javascript or python.


Let me know if you need further help and I will try to assist but I am no expert.



Barrie Greed
Barrie Greed is offline   Reply With Quote
Old 12-02-2020, 03:21 AM   #6
Bo Aakerstrom
Member
 
Bo Aakerstrom's Avatar
 
Join Date: Mar 2005
Location: Derby,UK
Posts: 1,509
Default

Look into web scraping - Beautiful Soup is a Python based tool for this.

Since it is your website it is OK to scrape it, stealing other people's content isn't. Just sayin'

   
__________________
www.boaakerstrom.com
Behance Portfolio
Bo Aakerstrom is offline   Reply With Quote
Old 12-03-2020, 02:15 PM   #7
Kayza
Member
 
Join Date: Jun 2010
Posts: 376
Default

Quote:
Originally Posted by Bo Aakerstrom View Post
Look into web scraping - Beautiful Soup is a Python based tool for this.

Since it is your website it is OK to scrape it, stealing other people's content isn't. Just sayin'
I hadn't thought about web scraping. I mean most of the time, why would you need to scrape your own content, right? But it just might be the right approach here.
Kayza is offline   Reply With Quote
Old 12-05-2020, 10:05 AM   #8
Steve Rindsberg
Staff
 
Join Date: Nov 2004
Posts: 7,714
Default

Looks like there are lots of web scraper extensions for Chrome. And all sorts of other things when you google web scrape or the like. Some have clever names (Octoparse!) but I'm amazed that a google search doesn't turn up one named "ScrapeGoat". Seems so OBVIOUS!

   
__________________
Steve Rindsberg
====================
www.pptfaq.com
www.pptools.com
and stuff
Steve Rindsberg is offline   Reply With Quote
Old 12-05-2020, 01:06 PM   #9
terrie
Sysop
 
Join Date: Oct 2004
Posts: 10,478
Default

LOL!!!!



Terrie
terrie is offline   Reply With Quote
Old 12-04-2020, 01:40 PM   #10
terrie
Sysop
 
Join Date: Oct 2004
Posts: 10,478
Default

Quote:
kayza: I really only want to pull the actual text. (I already have backups of the site.)
Ahhhh...I thought one of the options those approaches might offer would be exactly that because it seems to me to a fairly commmon need...so...sorry about that...


My guess is that you have a lot of pages so that manually doing a select all and then copy would be too tedious?




Terrie
terrie is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Export InDesign to Photoshop as layered file maxt Print Production & Automation 1 07-22-2014 01:00 PM
WordPress 3.0 ktinkel Software 8 06-24-2010 05:11 AM
FontAgentPro: export/import libraries? donat Fonts & Typography 13 02-28-2008 07:49 AM
WordPress 2.2.1 available ktinkel Web Site Building & Maintenance 20 06-23-2007 10:58 PM
Automate InDesign pdf export Krit Print Design 2 12-19-2006 03:10 PM


All times are GMT -8. The time now is 07:45 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Contents copyright 2004–2019 Desktop Publishing Forum and its members.