Webtags (preserving history): Difference between revisions

m
Line 32: Line 32:
== Encoding ==
== Encoding ==


You can skip this section, but if you have problems with a web page not displaying the '''°''' symbol correctly, it will be because the encoding declared in  your web page does not match the encoding you have selected for Cumulus to use when generating this report. Put simply, most modern web pages use "utf-8" encoding, but for historical reasons Cumulus defaults to producing files in ISO-8859-1 encoding. This causes the mismatch. With that introduction, you can now choose whether to read the rest of this section.
You can skip this section, but if you have problems with a web page not displaying the '''°''' symbol correctly, it will be because the character set encoding is either not declared or not consistent. Put simply, most modern web pages use "utf-8" encoding, but for historical reasons Cumulus defaults to producing files in ISO-8859-1 encoding. This causes the mismatch. To add just a little more detail here, if you choose to implement a web page to display a Cumulus report, then the HTML of the web page to display the report, the JavaScript that selects which report to show, and inserts the report into the HTML, and the report itself must all use the same encoding, to avoid problems with displaying correct characters.


Let me explain that technical term, essentially encoding refers to the character set used by any file. A computer uses binary, binary can only be in state 0 or state 1, so a combination of 0 and 1 states needs to be defined for every character you want to represent. What you can include in that character set depends to some extent on how many binary bits are used to be mapped to individual characters; and if more than one byte worth of bits is used the order in which the bits within the multiple bytes are used must be defined for each particular encoding. With any fixed number of bits available, there will be a limit to how many characters can be defined, and different organisations might select different characters to include. This is what leads to multiple encoding standards. One might use a particular arrangement of bits to represent the degree symbol, while another encoding uses that particular arrangement of bits for a different purpose. This means that when you read a file you might find the letters A to Z where you expect them, but actually some encodings put capital letters at lower binary values than lower case letters, and some put capitals at higher binary values. The general problem is that unless you know the encoding used, you don't know what character to display for certain combinations of bits.
With that introduction, you can now choose whether to read the rest of this section which uses more technical terminology.


If you use 7 bits, you have 127 combinations, enough for standard 26 letters in both capitals, and lower case, plus 10 digits (0 to 9), some punctuation, and some control characters (like new line, end of file, and so on). If you use 8 bits, a whole byte, you have 254 combinations, and you can start coping with accented letters, with alphabets that don't have 26 letters, and even add some symbols. Obviously, once you start using more than one byte, you can have 16, 32, or more bits to use and can include lots more characters.  
Let me explain that technical term, essentially encoding refers to the character set used by any file.
 
A computer uses binary, binary can only be in state 0 or state 1, so a combination of 0 and 1 states needs to be defined for every character you want to represent. What you can include in that character set depends to some extent on how many binary bits are used to be mapped to individual characters; and if more than one byte worth of bits is used the order in which the bits within the multiple bytes are used must be defined for each particular encoding.
 
With any fixed number of bits available, there will be a limit to how many characters can be defined, and different organisations might select different characters to include. This is what leads to multiple encoding standards. One might use a particular arrangement of bits to represent the degree symbol, while another encoding uses that particular arrangement of bits for a different purpose. The general problem is that unless you match the encoding used initially, any retrieval cannot know what character to display for certain combinations of bits.
 
This means that when you read a file you probably find the letters A to Z where you expect them, but whether you see correct case cannot be guaranteed. Some encodings put capital letters at lower binary values than lower case letters, and some put capitals at higher binary values.
 
If you use 7 bits, you have 127 combinations, enough for standard 26 letters in both capitals, and lower case, plus 10 digits (0 to 9), some punctuation, and some control characters (like new line, end of file, and so on). If you use 8 bits, a whole byte, you have 254 combinations, and you can start coping with accented letters, with alphabets that don't have 26 letters, and even add some symbols. Obviously, once you start using more than one byte, you can have 16, 32, 64, or even more bits to use and can include lots more characters and the bigger character sets start including lots of symbols and the biggest add smilies or emotion icons.  
   
   
In April 2014, Steve introduced the choice in Cumulus 1 of either ISO-8859-1 encoding (as he used originally) or UTF-8 encoding (what he migrated his web page templates to) for these reports. This choice remains unchanged in MX. The default selected by Steve Loft is his original ISO-8859-1 encoding, but be aware the encoding you use should match the encoding of any web page used for viewing these reports, and most modern web pages (including the standard web templates provided with both flavours of Cumulus) use UTF-8 encoding. The encoding can be selected on the NOAA Settings screen of either Cumulus 1 or MX.
In April 2014, Steve introduced the choice in Cumulus 1 of either ISO-8859-1 encoding (as he used originally) or UTF-8 encoding (what he migrated his web page templates to) for these reports. This choice remains unchanged in MX. The default selected by Steve Loft is his original ISO-8859-1 encoding, but be aware the encoding you use should match the encoding of any web page used for viewing these reports, and most modern web pages (including the standard web templates provided with both flavours of Cumulus) use UTF-8 encoding. The encoding can be selected on the NOAA Settings screen of either Cumulus 1 or MX.
5,838

edits