I’m sure most of us who have worked with the Siku quanshu 四庫全書 database have dreamed of extracting texts of whole books without having to copy them page-by-page. It turns out that some books are indeed available as text files in here. The list is not complete, but it includes quit a few books from the History (史), Philosophy (子), and Literature (集) sections. Some books from the Classics (經) section are also available, but not that many.

If you are lucky enough to find the title of your interest, you might need to convert the encoding of the downloaded file before you can see the text properly. A tool that I have found handy is Encoding Master. Use it to open the file, and convert from DOS Chinese Simplified (GBK) to UTF-8. Now you have a clean text file of your favorite book!

Some points of caution:

  • The text is in simplified Chinese.
  • This comes from an online forum, so it can disappear anytime (although apparently it has been there for over a year now).
  • Not all files are from Siku quanshu. (Read the disclaimer in Paragraph #6 at the top of the page.) In any case, the files come from totally unknown sources, so they are as reliable as Wikipedia.
  • Depending on your understanding of the copyright of digitalized old texts, you might feel guilty using these files.

This is the most comprehensive list of clean Siku quanshu texts that I have seen so far. If anyone knows of a better source, I’d appreciate the information very much.

  1. I’m sure the authors won’t claim their copyrights.

  2. No, they wouldn’t, but the company that originally digitalized these texts might. I think it’s quite a tricky matter — on the one hand, the texts themselves are certainly in the public domain; on the other hand, a lot of time and money goes into digitalizing these texts. If it becomes too easy for such “pirated” versions to circulate, the companies that specialize in this kind of projects might lose the incentive to continue on.
    But of course, it’s all too tempting to grab good digital sources as long as they are available, so I just pretend that they are in the public domain for now…

  3. fanren8

     /  2012/04/13


  4. Thank you for your information. I have just downloaded some of text files listed on the jing section. I appreciate it very much. One comment: the link to Encoding Master does not exist anymore. I just opened the files with my text editor(EmEditor) and save as utf-8, and it worked.

  5. Thanks very much for notifying me about the broken link! I have just fixed it, although I’m not sure whether this one might also disappear after some time. Hopefully it will stay for a while.


