# CHARACTER_SET defines the display character set, i.e., assumed to be # installed on the user's terminal. It determines which characters or strings # will be used to represent 8-bit character entities within HTML. New # character sets may be defined as explained in the README files of the # src/chrtrans directory in the Lynx source code distribution. For Asian (CJK) # character sets, it also determines how Kanji code will be handled. The # default is defined in userdefs.h and can be changed here or via the # 'o'ptions menu. The 'o'ptions menu setting will be stored in the user's RC # file whenever those settings are saved, and thereafter will be used as the # default. For Lynx a "character set" has two names: a MIME name (for # recognizing properly labeled charset parameters in HTTP headers etc.), and a # human-readable string for the 'O'ptions Menu (so you may find info about # language or group of languages besides MIME name). Not all 'human-readable' # names correspond to exactly one valid MIME charset (example is "Chinese"); # in that case an appropriate valid (and more specific) MIME name should be # used where required. Well-known synonyms are also processed in the code. # # Raw (CJK) mode # # Lynx normally translates characters from a document's charset to display # charset, using ASSUME_CHARSET value (see below) if the document's charset # is not specified explicitly. Raw (CJK) mode is OFF for this case. # When the document charset is specified explicitly, that charset # overrides any assumption like ASSUME_CHARSET or raw (CJK) mode. # # For the Asian (CJK) display character sets, the corresponding charset is # assumed in documents, i.e., raw (CJK) mode is ON by default. In raw CJK # mode, 8-bit characters are not reverse translated in relation to the entity # conversion arrays, i.e., they are assumed to be appropriate for the display # character set. The mode should be toggled OFF when an Asian (CJK) display # character set is selected but the document is not CJK and its charset not # specified explicitly. # # Raw (CJK) mode may be toggled by user via '@' (LYK_RAW_TOGGLE) key, # the -raw command line switch or from the 'o'ptions menu. # # Raw (CJK) mode effectively changes the charset assumption about unlabeled # documents. You can toggle raw mode ON if you believe the document has a # charset which does correspond to your Display Character Set. On the other # hand, if you set ASSUME_CHARSET the same as Display Character Set you get raw # mode ON by default (but you get assume_charset=iso-8859-1 if you try raw mode # OFF after it). # # Note that "raw" does not mean that every byte will be passed to the screen. # HTML character entities may get expanded and translated, inappropriate # control characters filtered out, etc. There is a "Transparent" pseudo # character set for more "rawness". # # Since Lynx now supports a wide range of platforms it may be useful to note # the cpXXX codepages used by IBM PC compatible computers, and windows-xxxx # used by native MS-Windows apps. We also note that cpXXX pages rarely are # found on Internet, but are mostly for local needs on DOS. # # Recognized character sets include: # # # string for 'O'ptions Menu MIME name # =========================== ========= # 7 bit approximations (US-ASCII) us-ascii # Western (ISO-8859-1) iso-8859-1 # Western (ISO-8859-15) iso-8859-15 # Western (cp850) cp850 # Western (windows-1252) windows-1252 # IBM PC US codepage (cp437) cp437 # DEC Multinational dec-mcs # Macintosh (8 bit) macintosh # NeXT character set next # HP Roman8 hp-roman8 # Chinese euc-cn # Japanese (EUC-JP) euc-jp # Japanese (Shift_JIS) shift_jis # Korean euc-kr # Taipei (Big5) big5 # Vietnamese (VISCII) viscii # Eastern European (ISO-8859-2) iso-8859-2 # Eastern European (cp852) cp852 # Eastern European (windows-1250) windows-1250 # Latin 3 (ISO-8859-3) iso-8859-3 # Latin 4 (ISO-8859-4) iso-8859-4 # Baltic Rim (ISO-8859-13) iso-8859-13 # Baltic Rim (cp775) cp775 # Baltic Rim (windows-1257) windows-1257 # Celtic (ISO-8859-14) iso-8859-14 # Cyrillic (ISO-8859-5) iso-8859-5 # Cyrillic (cp866) cp866 # Cyrillic (windows-1251) windows-1251 # Cyrillic (KOI8-R) koi8-r # Arabic (ISO-8859-6) iso-8859-6 # Arabic (cp864) cp864 # Arabic (windows-1256) windows-1256 # Greek (ISO-8859-7) iso-8859-7 # Greek (cp737) cp737 # Greek2 (cp869) cp869 # Greek (windows-1253) windows-1253 # Hebrew (ISO-8859-8) iso-8859-8 # Hebrew (cp862) cp862 # Hebrew (windows-1255) windows-1255 # Turkish (ISO-8859-9) iso-8859-9 # North European (ISO-8859-10) iso-8859-10 # Ukrainian Cyrillic (cp866u) cp866u # Ukrainian Cyrillic (KOI8-U) koi8-u # UNICODE (UTF-8) utf-8 # RFC 1345 w/o Intro mnemonic+ascii+0 # RFC 1345 Mnemonic mnemonic # Transparent x-transparent # # # The value should be the MIME name of a character set recognized by # Lynx (case insensitive). # Find RFC 1345 at http://www.ics.uci.edu/pub/ietf/uri/rfc1345.txt . # #CHARACTER_SET:iso-8859-1 CHARACTER_SET:utf-8
# LOCALE_CHARSET overrides CHARACTER_SET if true, using the current locale to # lookup a MIME name that corresponds, and use that as the display charset. # # Note that while nl_langinfo(CODESET) itself is standardized, the return # values and their relationship to the locale value is not. GNU libiconv # happens to give useful values, but other implementations are not guaranteed # to do this. #LOCALE_CHARSET:FALSE
# ASSUME_CHARSET changes the handling of documents which do not # explicitly specify a charset. Normally Lynx assumes that 8-bit # characters in those documents are encoded according to iso-8859-1 # (the official default for the HTTP protocol). When ASSUME_CHARSET # is defined here or by an -assume_charset command line flag is in effect, # Lynx will treat documents as if they were encoded accordingly. # See above on how this interacts with "raw mode" and the Display # Character Set. # ASSUME_CHARSET can also be changed via the 'o'ptions menu but will # not be saved as permanent value in user's .lynxrc file to avoid more chaos. # #ASSUME_CHARSET:iso-8859-1
# It is possible to reduce the number of charset choices in the 'O'ptions menu # for "display charset" and "assumed document charset" fields via # DISPLAY_CHARSET_CHOICE and ASSUMED_DOC_CHARSET_CHOICE settings correspondingly. # Each of these settings can be used several times to define the set of possible # choices for corresponding field. The syntax for the values is # # string | prefix* | * # # where # # 'string' is either the MIME name of charset or it's full name (listed # either in the left or in the right column of table of # recognized charsets), case-insensitive - e.g. 'Koi8-R' or # 'Cyrillic (KOI8-R)' (both without quotes), # # 'prefix' is any string, and such value will select all charsets having # the name with prefix matching given (case insensitive), i.e., # for the charsets listed in the table of recognized charsets, # # # Example: # ASSUMED_DOC_CHARSET_CHOICE:cyrillic* # will be equal to specifying # # Examples: # ASSUMED_DOC_CHARSET_CHOICE:cp866 # ASSUMED_DOC_CHARSET_CHOICE:windows-1251 # ASSUMED_DOC_CHARSET_CHOICE:koi8-r # ASSUMED_DOC_CHARSET_CHOICE:iso-8859-5 # or lines with full names of charsets. # # literal string '*' (without quotes) will enable all charset choices # in corresponding field. This is useful for overriding site # defaults in private pieces of lynx.cfg included via INCLUDE # directive. # # Default values for both settings are '*', but any occurrence of settings # with values that denote any charsets will make only listed choices available # for corresponding field. #ASSUMED_DOC_CHARSET_CHOICE:* #DISPLAY_CHARSET_CHOICE:*
# ASSUME_LOCAL_CHARSET is like ASSUME_CHARSET but only applies to local # files. If no setting is given here or by an -assume_local_charset # command line option, the value for ASSUME_CHARSET or -assume_charset # is used. It works for both text/plain and text/html files. # This option will ignore "raw mode" toggling when local files are viewed # (it is "stronger" than "assume_charset" or the effective change # of the charset assumption caused by changing "raw mode"), # so only use when necessary. # #ASSUME_LOCAL_CHARSET:iso-8859-1 ASSUME_LOCAL_CHARSET:utf-8
# PREPEND_CHARSET_TO_SOURCE:TRUE tells Lynx to prepend a META CHARSET line # to text/html source files when they are retrieved for 'd'ownloading # or passed to 'p'rint functions, so HTTP headers will not be lost. # This is necessary for resolving charset for local html files, # while the assume_local_charset is just an assumption. # For the 'd'ownload option, a META CHARSET will be added only if the HTTP # charset is present. The compilation default is TRUE. # It is generally desirable to have charset information for every local # html file, but META CHARSET string potentially could cause # compatibility problems with other browsers, see also PREPEND_BASE_TO_SOURCE. # Note that the prepending is not done for -source dumps. # #PREPEND_CHARSET_TO_SOURCE:TRUE PREPEND_CHARSET_TO_SOURCE:TRUE
# NCR_IN_BOOKMARKS:TRUE allows you to save 8-bit characters in bookmark titles # in the unicode format (NCR). This may be useful if you need to switch # display charsets frequently. This is the case when you use Lynx on different # platforms, e.g., on UNIX and from a remote PC, and want to keep the bookmarks # file persistent. # Another aspect is compatibility: NCR is part of I18N and HTML4.0 # specifications supported starting with Lynx 2.7.2, Netscape 4.0 and MSIE 4.0. # Older browser versions will fail so keep NCR_IN_BOOKMARKS:FALSE if you # plan to use them. # #NCR_IN_BOOKMARKS:FALSE
# FORCE_8BIT_TOUPPER overrides locale settings and uses internal 8-bit # case-conversion mechanism for case-insensitive searches in non-ASCII display # character sets. It is FALSE by default and should not be changed unless # you encounter problems with case-insensitive searches. # #FORCE_8BIT_TOUPPER:FALSE
# While Lynx supports different platforms and display character sets # we need to limit the charset in outgoing mail to reduce # trouble for remote recipients who may not recognize our charset. # You may try US-ASCII as the safest value (7 bit), any other MIME name, # or leave this field blank (default) to use the display character set. # Charset translations currently are implemented for mail "subjects= " only. # #OUTGOING_MAIL_CHARSET: OUTGOING_MAIL_CHARSET:us-ascii
# If Lynx encounters a charset parameter it doesn't recognize, it will # replace the value given by ASSUME_UNREC_CHARSET (or a corresponding # -assume_unrec_charset command line option) for it. This can be used # to deal with charsets unknown to Lynx, if they are "sufficiently # similar" to one that Lynx does know about, by forcing the same # treatment. There is no default, and you probably should leave this # undefined unless necessary. # #ASSUME_UNREC_CHARSET:iso-8859-1
# PREFERRED_LANGUAGE is the language in MIME notation (e.g., "en", # "fr") which will be indicated by Lynx in its Accept-Language headers # as the preferred language. If available, the document will be # transmitted in that language. Users can override this setting via # the 'o'ptions menu and save that preference in their RC file. # This may be a comma-separated list of languages in decreasing preference. # #PREFERRED_LANGUAGE:en PREFERRED_LANGUAGE:en
# PREFERRED_CHARSET specifies the character set in MIME notation (e.g., # "ISO-8859-2", "ISO-8859-5") which Lynx will indicate you prefer in # requests to http servers using an Accept-Charsets header. Users can # change it via the 'o'ptions menu and save that preference in their RC file. # The value should NOT include "ISO-8859-1" or "US-ASCII", # since those values are always assumed by default. # If a file in that character set is available, the server will send it. # If no Accept-Charset header is present, the default is that any # character set is acceptable. If an Accept-Charset header is present, # and if the server cannot send a response which is acceptable # according to the Accept-Charset header, then the server SHOULD send # an error response with the 406 (not acceptable) status code, though # the sending of an unacceptable response is also allowed. See RFC 2068 # (http://www.ics.uci.edu/pub/ietf/uri/rfc2068.txt). # #PREFERRED_CHARSET:
# CHARSETS_DIRECTORY specifies the directory with the fonts (glyph data) # used by Lynx to switch the display-font to a font best suited for the # given document. The font should be in a format understood by the # platforms TTY-display-font-switching API. Currently supported on OS/2 only. # # Lynx expects the glyphs for the charset CHARSET with character cell # size HHHxWWW to be stored in a file HHHxWWW/CHARSET.fnt inside the directory # specified by CHARSETS_DIRECTORY. E.g., the font for koi8-r sized 14x9 # should be in the file 14x9/koi8-r.fnt. # #CHARSETS_DIRECTORY:
# CHARSET_SWITCH_RULES hints lynx on how to choose the best display font given # the document encoding. This string is a sequence of chunks, each chunk # having the following form: # # IN_CHARSET1 IN_CHARSET2 ... IN_CHARSET5 :OUT_CHARSET # # For readability, one may insert arbitrary additional punctuation (anything # but : is ignored). E.g., if lynx is able to switch only to display charsets # cp866, cp850, cp852, and cp862, then the following setting may be useful # (split for readability): # # CHARSET_SWITCH_RULES: koi8-r ISO-8859-5 windows-1251 cp866u KOI8-U :cp866, # iso-8859-1 windows-1252 ISO-8859-15 :cp850, # ISO-8859-2 windows-1250 :cp852, # ISO-8859-8 windows-1255 :cp862 # #CHARSET_SWITCH_RULES:
Prev: CGI scripts || Next: Cookies