Speed-Copying Annotations from Scripta Sinica (A Tip for Mac Users)

The Scripta Sinica Database 漢籍電子文獻資料庫 is a treasure house for scholars of premodern China, especially those fortunate to have institutional subscription to the proprietary portions. I use this database regularly, but occasionally I run into a small technical problem: when copying texts from the database (in my case, always as plain text), what to do with the occasional interlinear annotations, like the following:

Scripta 1

Scripta 2

If I copy the text before expanding the annotations, I will have the main text but not the annotations. If I copy everything after expanding the annotations, I get a text that contains both the main text and the annotations jumbled together. Needless to say, this would be very confusing to read.

Ideally, I want to save a copy of the main text followed by all annotations at the bottom, like this:


Up to now, I have always copy-pasted the annotations one by one. But recently, I became irritated by the time-consuming process. So I started asking: How can I copy all annotations at once and save my precious time?

Fortunately, it turns out that there is an easy trick for Mac users. (I’m sure similar solutions are available on Windows. If anyone is aware of one, please let me know.)

  1. After expanding the annotations, copy everything and paste it into TextEdit:
    Scripta 3
  2. Select one or more characters from any of the annotations:
    Scripta 4
  3. Go to Format>Font>Styles. You should see a window like the following. Click on “Select.”
    Scripta 5
  4. Check “Select by style” (uncheck everything else), and select “Select within entire document” (default setting). Now click on “Select.”
    Scripta 6
  5. All the annotations should be automatically selected. Copy and paste into your working file, and voilà, you have all the annotations. Happy copying!
    Scripta 7

Downloading Siku quanshu Text Files

I’m sure most of us who have worked with the Siku quanshu 四庫全書 database have dreamed of extracting texts of whole books without having to copy them page-by-page. It turns out that some books are indeed available as text files in here. The list is not complete, but it includes quit a few books from the History (史), Philosophy (子), and Literature (集) sections. Some books from the Classics (經) section are also available, but not that many.

If you are lucky enough to find the title of your interest, you might need to convert the encoding of the downloaded file before you can see the text properly. A tool that I have found handy is Encoding Master. Use it to open the file, and convert from DOS Chinese Simplified (GBK) to UTF-8. Now you have a clean text file of your favorite book!

Some points of caution:

  • The text is in simplified Chinese.
  • This comes from an online forum, so it can disappear anytime (although apparently it has been there for over a year now).
  • Not all files are from Siku quanshu. (Read the disclaimer in Paragraph #6 at the top of the page.) In any case, the files come from totally unknown sources, so they are as reliable as Wikipedia.
  • Depending on your understanding of the copyright of digitalized old texts, you might feel guilty using these files.

This is the most comprehensive list of clean Siku quanshu texts that I have seen so far. If anyone knows of a better source, I’d appreciate the information very much.