Transact sql string functions. String functions and operators

Last update: 29.07.2017

The following functions can be used to work with strings in T-SQL:

    LEN: Returns the number of characters in a line. As a parameter, a string is passed to the function for which the length must be found:

    SELECT LEN("Apple") -- 5

    LTRIM: Removes leading spaces from a string. Takes the string as a parameter:

    SELECT LTRIM(" Apple")

    RTRIM: Removes trailing spaces from a string. Takes the string as a parameter:

    SELECT RTRIM(" Apple ")

    CHARINDEX: Returns the index at which the first occurrence of a substring in a string is found. A substring is passed as the first parameter, and the string in which to search is passed as the second:

    SELECT CHARINDEX("pl", "Apple") -- 3

    PATINDEX: Returns the index at which the first occurrence of a particular pattern is found in a string:

    SELECT PATINDEX("%p_e%", "Apple") -- 3

    LEFT: Cuts a specified number of characters from the beginning of a line. The first parameter of the function is the string, and the second is the number of characters that need to be cut from the beginning of the string:

    SELECT LEFT("Apple", 3) -- App

    RIGHT: Cuts a specified number of characters from the end of a string. The first parameter of the function is the string, and the second is the number of characters that need to be cut from the beginning of the string:

    SELECT RIGHT("Apple", 3) -- ple

    SUBSTRING: Cuts a substring of a specified length from a string, starting at a specific index. The first parameter of the function is the string, the second is the starting index for cutting, and the third parameter is the number of characters to cut:

    SELECT SUBSTRING("Galaxy S8 Plus", 8, 2) -- S8

    REPLACE: Replaces one substring with another within a string. The first parameter of the function is a string, the second is the substring to be replaced, and the third is the substring to be replaced with:

    SELECT REPLACE("Galaxy S8 Plus", "S8 Plus", "Note 8") -- Galaxy Note 8

    REVERSE : reverses the string:

    SELECT REVERSE("123456789") -- 987654321

    CONCAT : Concatenates two strings into one. As a parameter, it accepts 2 or more strings that need to be connected:

    SELECT CONCAT("Tom", " ", "Smith") -- Tom Smith

    LOWER : Converts the string to lower case:

    SELECT LOWER("Apple") -- apple

    UPPER : converts the string to uppercase

    SELECT UPPER("Apple") -- APPLE

    SPACE: returns a string that contains a specified number of spaces

For example, let's take the table:

CREATE TABLE Products (Id INT IDENTITY PRIMARY KEY, ProductName NVARCHAR(30) NOT NULL, Manufacturer NVARCHAR(20) NOT NULL, ProductCount INT DEFAULT 0, Price MONEY NOT NULL);

And when retrieving data, we will use string functions:

SELECT UPPER(LEFT(Manufacturer,2)) AS Abbreviation, CONCAT(ProductName, " - ", Manufacturer) AS FullProdName FROM Products ORDER BY Abbreviation

Here is a complete list of string functions taken from BOL:

The result is 11. To find out what letters these are, we can use the CHAR function, which returns the character by a known ASCII code (from 0 to 255):

Here's how, for example, you can get a table of codes for all alphabetic characters:

SELECT CHAR(ASCII("a")+ num-1) letter, ASCII("a")+ num - 1 FROM (SELECT 5*5*(a-1)+5*(b-1) + c AS num FROM (SELECT 1 a UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) x CROSS JOIN (SELECT 1 b UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) y CROSS JOIN ( SELECT 1 c UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) z) x WHERE ASCII("a")+ num -1 BETWEEN ASCII("a") AND ASCII("z")

I refer those who are not yet aware of the generation of a number sequence to the corresponding article.

As you know, the codes for lowercase and uppercase letters are different. Therefore, to get the full set without rewriting the request, you just need to add a similar one to the above code:

I figure it wouldn't be too difficult to add this letter to the table if needed.

Let us now consider the task of determining where to find the desired substring in a string expression. Two functions can be used for this - CHARINDEX And PATINDEX. They both return the starting position (the position of the first character of the substring) of the substring in the string. The CHARINDEX function has the syntax:

CHARINDEX( search_expression, string_expression[, start_position])

Here's an optional integer parameter start_position defines the position in a string expression from which the search is performed search_expression. If this parameter is omitted, the search is performed from the beginning string_expression. For example, request

It should be noted that if the searched substring or string expression is NULL, then the result of the function will also be NULL.

The following example determines the positions of the first and second occurrence of the character "a" in the ship name "California"

But, for example, how can you find the names of ships that contain a sequence of three characters, the first and last of which are “e”:

Steam room to LEFT function RIGHT returns the specified number of characters to the right from a string expression:

RIGHT(<string expression>,<number of characters>)

Here, for example, is how you can determine the names of ships that begin and end with the same letter:

Here we separate the class name and the ship name with a space. In addition, in order not to repeat the entire construction as a function argument, we use a subquery. The result will look like:

To exclude this case, you can use another useful function LEN (<string expression>) , which returns the number of characters in the string. Let's limit ourselves to the case when the number of characters is greater than one:

Function REPLICATE pads the constant "abcde" with five spaces on the right, which are not taken into account by the function LEN, - in both cases we get 5.
Function DATALENGTH returns the number of bytes in the variable's representation and shows us the difference between CHAR and VARCHAR types. DATALENGTH will give us 12 for the CHAR type and 10 for the VARCHAR type.
As expected, DATALENGTH for a variable of type VARCHAR, returned the actual length of the variable. But why did the result turn out to be 12 for a variable of type CHAR? The point is that CHAR is a type fixed length. If the value of a variable is less than its length, and we declared the length as CHAR(12), then the value of the variable will be “aligned” to the required length by adding trailing spaces.

There are tasks on the site in which you need to arrange (find the maximum, etc.) in numerical order the values ​​​​presented in text format. For example, airplane seat number (“2d”) or CD speed (“24x”). The problem is that the text is sorted like this (ascending)

If you want to arrange the places in ascending order of rows, then the order should be like this

If we limit ourselves to this, we get

All that remains is to sort

Here is a complete list of string functions taken from BOL:

ASCII NCHAR SOUNDEX
CHAR PATINDEX SPACE
CHARINDEX REPLACE STR
DIFFERENCE QUOTENAME STUFF
LEFT REPLICATE SUBSTRING
LEN REVERSE UNICODE
LOWER RIGHT UPPER
LTRIM RTRIM

Let's start with two mutually inverse functions - ASCII And CHAR.

The ASCII function returns the ASCII code of the leftmost character of the string expression that is the function argument.

Here, for example, is how you can determine how many different letters there are that start the names of ships in the Ships table:


It should be noted that a similar result can be obtained more easily using another function - LEFT, which has the following syntax:

LEFT (<string expression>, <integer expression>)

and cuts the number of characters from the left specified by the second argument from the string that is the first argument. So,

SELECT DISTINCT LEFT(name, 1) FROM Ships ORDER BY 1

Here's how, for example, you can get a table of codes for all alphabetic characters:

SELECT CHAR(ASCII("a")+ num-1) letter, ASCII("a")+ num - 1
FROM (SELECT 5*5*(a-1)+5*(b-1) + c AS num
FROM (SELECT 1 a UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) x
CROSS JOIN
(SELECT 1 b UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) y
CROSS JOIN
(SELECT 1 c UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) z
)x
WHERE ASCII("a")+ num -1 BETWEEN ASCII("a") AND ASCII("z")

For those who are not yet aware of the generation of a number sequence, I refer you to the corresponding article.

As you know, the codes for lowercase and uppercase letters are different. Therefore, to get the full set without rewriting the request, you just need to add a similar one to the above code:


I figure it wouldn't be too difficult to add this letter to the table if needed.

Let us now consider the task of determining where to find the desired substring in a string expression. Two functions can be used for this - CHARINDEX And PATINDEX. They both return the starting position (the position of the first character of the substring) of the substring in the string. The CHARINDEX function has the syntax:

CHARINDEX( search_expression, string_expression[, start_position])

Here's an optional integer parameter start_position defines the position in a string expression from which the search is performed search_expression. If this parameter is omitted, the search is performed from the beginning string_expression. For example, request

It should be noted that if the searched substring or string expression is NULL, then the result of the function will also be NULL.

The following example determines the positions of the first and second occurrence of the character "a" in the ship name "California"

SELECT CHARINDEX("a",name) first_a,
CHARINDEX("a", name, CHARINDEX("a", name)+1) second_a
FROM Ships WHERE name="California"

Please note that when defining the second character in the function, the starting position is used, which is the position of the character following the first letter "a" - CHARINDEX("a", name)+1. The correctness of the result - 2 and 10 - is easy to check :-).

The PATINDEX function has the syntax:

PATINDEX("% sample%" , string_expression)

The main difference between this function and CHARINDEX is that search string may contain wildcards - % and _. In this case, the trailing characters "%" are required. For example, using this function in the first example would look like


The result of this query looks like this:


The fact that we end up with an empty result set means that there are no such ships in the database. Let's take a combination of values ​​- the class and name of the ship.

Combining two string values ​​into one is called concatenation, and in SQL Server for this operation the "+" sign is used (in the standard "||"). So,

What if the string expression contains only one letter? The query will bring it up. You can easily verify this by writing

Sql String Functions

This group of functions allows you to manipulate text. There are many string functions, we will look at the most common ones.
  • CONCAT(str1,str2...) Returns a string created by concatenating the arguments (the arguments are in parentheses - str1,str2...). For example, in our Vendors table there is a City column and an Address column.

    Suppose we want the resulting table to have Address and City in the same column, i.e. we want to combine data from two columns into one. To do this, we will use the CONCAT() string function, and as arguments we will indicate the names of the columns to be combined - city and address:


    Please note that the merging occurred without splitting, which is not very readable. Let's adjust our query so that there is a space between the columns being joined:

    SELECT CONCAT(city, " ", address) FROM vendors;


    As you can see, a space is also considered an argument and is indicated separated by a comma. If there were more columns to be merged, then specifying spaces each time would be irrational. In this case, one could use the string function CONCAT_WS(separator, str1,str2...), which places a separator between the concatenated strings (the separator is specified as the first argument). Our query will then look like this:

    SELECT CONCAT_WS(" ", city, address) FROM vendors;

    The result did not change externally, but if we were to join 3 or 4 columns, the code would be significantly reduced.


  • INSERT(str, pos, len, new_str) Returns the string str with the substring starting at position pos and having a length of len characters replaced by the substring new_str. Suppose we decide not to display the first 3 characters in the Address column (abbreviations st., pr., etc.), then we will replace them with spaces:

    SELECT INSERT(address, 1, 3, " ") FROM vendors;


    That is, three characters, starting from the first, are replaced by three spaces.


  • LPAD(str, len, dop_str) Returns the string str, left padded with dop_str to length len.

    Let's say we want to display supplier cities to the right and fill the empty space with dots:



  • SELECT LPAD(city, 15, ".") FROM vendors;

    RPAD(str, len, dop_str) Returns the string str right padded with dop_str to length len.


    Let's say we want to display supplier cities to the left, and fill the empty space with dots:


  • SELECT RPAD(city, 15, ".") FROM vendors;

    Please note that the len value limits the number of characters displayed, i.e. if the city name is longer than 15 characters, it will be truncated.


  • LTRIM(str) Returns the string str with all leading spaces removed. This string function is convenient for correctly displaying information in cases where random spaces are allowed when entering data:

    SELECT LTRIM(city) FROM vendors;

    RTRIM(str) Returns the string str with all trailing spaces removed:


  • SELECT RTRIM(city) FROM vendors;

    In our case, there were no extra spaces, so we won’t see the result externally.


  • LOWER(str) Returns the string str with all characters converted to lowercase.

    It does not work correctly with Russian letters, so it is better not to use it. For example, let's apply this function to the city column:


    SELECT city, LOWER(city) FROM vendors;

    See what kind of gobbledygook it turned out to be. But everything is fine with the Latin alphabet:



  • SELECT LOWER("CITY");

    UPPER(str) Returns the string str with all characters converted to uppercase.



  • It is also better not to use it with Russian letters. But everything is fine with the Latin alphabet:

    SELECT UPPER(email) FROM customers;



  • LENGTH(str) Returns the length of the string str. For example, let's find out how many characters are in our supplier addresses:

    SELECT address, LENGTH(address) FROM vendors;



  • LEFT(str, len) Returns the len left characters of the string str. For example, let only the first three characters be displayed in supplier cities:
    SELECT name, LEFT(city, 3) FROM vendors;

RIGHT(str, len) Returns the len right characters of the string str. For example, let only the last three characters be displayed in supplier cities: SELECT LOAD_FILE("C:/proverka");

Please note that you must specify the absolute path to the file.

As already mentioned, there are many more string functions, but even some of those discussed here are used extremely rarely. Therefore, let’s finish considering them here and move on to more commonly used date and time functions.

Basic string functions and operators provide a variety of capabilities and return a string value as a result. Some string functions are two-element, meaning they can operate on two strings at once. The SQL 2003 standard supports string functions.

Concatenation operator

SQL 2003 defines the concatenation operator (||), which joins two separate strings into a single string value.

DB2 platform

The DB2 platform supports the SQL 2003 concatenation operator as well as its synonym, the CONCAT function.

MySQL platform

The MySQL platform supports the CONCATQ function, a synonym for the SQL 2003 concatenation operator.

Oracle and PostgreSQL

The PostgreSQL and Oracle platforms support the SQL 2003 double vertical bar concatenation operator.

SQL Server platform

The SQL Server platform uses the plus sign (+) as a synonym for the SQL 2003 concatenation operator. SQL Server has a system parameter CONCAT_NULL_YIELDS_NULL that controls how the system behaves if NULL values ​​are encountered when concatenating string values.

/* SQL 2003 syntax */

/* For MySQL */

CONCAT("stringl", "string2")

If any of the concatenated values ​​are empty, then an empty string is returned. Additionally, if a numeric value is involved in the concatenation, it is implicitly converted to a string value.

SELECT CONCAT("My ", "bologna", "has", "a", "first", "name...");

My bologna has a first name

SELECT CONCAT("My ", NULL, "has", "first", "name...");

CONVERT and TRANSLATE

The CONVERT function changes the display of a character string within a character set and collation. For example, the CONVERT function can be used to change the number of bits per character.

The TRANSLATE function translates a string value from one character set to another. For example, the TRANSLATE function can be used to convert a value from the English character set to the Kanji (Japanese) or Cyrillic (Russian) character set. The translation itself must already exist - either specified by default or created using the CREATE TRANSLATION command.

SQL 2003 syntax

CONVERT (character_value USING character_conversion_name)

TRANSLATE(character_value USING translation_name)

The CONVERT function converts a character value to the character set with the name specified in the character_conversion_name parameter. The TRANSLATE function converts a character value to the character set specified in translation_name.

Among the platforms reviewed, only Oracle supports the CONVERT and TRANSLATE functions as defined in the SQL 2003 standard. Oracle's implementation of the TRANSLATE function is very similar to, but not identical to, SQL 2003. In this implementation, the function takes only two arguments and translates only between the database character set and the locale-enabled character set.

MySQL's implementation of the CONV function only converts numbers from one base to another. SQL Server's implementation of the CONVERT function is quite rich in capabilities and changes the data type of the expression, but in all other aspects it differs from the CONVERT function of the SQL 2003 standard. The PostgreSQL platform does not support the CONVERT function, and the implementation of the TRANSLATE function converts all occurrences of a character string to any another character string.

DB2

The DB2 platform does not support the CONVERT function, and support for the TRANSLATE function is not ANSI compliant. The TRANSLATE function is used to transform substrings and has historically been synonymous with the UPPER function because the UPPER function was only recently added to DB2. If the TRANSLATE function is used in DB2 with a single argument as a character expression, the result is the same string converted to uppercase. If the function is used with multiple arguments, such as TRANSLATE(ucmo4HUK, replace, match), then the function converts all characters in the source that are also in the match parameter. Every character in the source that is in the same position as in the match parameter will be replaced with the character from the replace parameter. Below is an example.

TRANSLATE("Hello, World!") "HELLO; WORLD!"

TRANSLATE("Hello, World1", "wZ", "1W") "Hewwo, Zorwd1

MySQL

The MySQL platform does not support the TRANSLATE and CONVERT functions.

Oracle

The Oracle platform supports the following syntax for the CONVERT and TRANSLATE functions.

In Oracle's implementation, the CONVERT function returns the text of a character value converted to the target_charset_set. The char_value parameter is the string to be converted, the target_charset_set is the name of the character set into which the string is to be converted, and the source_charset parameter is the character set in which the string value was originally stored.

The TRANSLATE function in Oracle conforms to ANSI syntax, but you can only choose one of two character sets: the database character set (CHARJCS) and the national language-specific character set (NCHARJZS).

Oracle also supports another function, also called TRANSLATE (without using keyword USING). This TRANSLATE function has nothing to do with character set conversion.

The names of the target and source character sets can be passed either as string constants or as a reference to a table column. Note that when converting a string to a character set that does not display all the characters being converted, you can substitute replacement characters.

Oracle supports several common character sets, which include US7ASCII and WE8DECDEC. WE8HP, F7DEC, WE8EBCDIC500, WE8PC850u WE8ISO8859PI. For example:

SELECT CONVERT("Gro2", "US7ASCII", "WE8HP") FROM DUAL;

PostgreSQL

The PostgreSQL platform supports the ANSI standard CONVERT statement, and conversions here can be defined using the CREATE CONVERSION command. PostgreSQL's implementation of the TRANSLATE function provides an extended set of functions that allow you to transform any text into other text within a specified string.

TRANSLATE (character string, from_text, to_text)

Here are some examples:

SELECT TRANSLATE("12345abcde", "5a", "XX"); "1234XXbcde" SELECT TRANSLATE(title, "Computer", "PC") FROM titles WHERE type="Personal_computer" SELECT CONVERT("PostgreSQL" USING iso_8859_1_to_utf_8) "PostgreSQL"

SQL Server

The SQL Server platform does not support the TRANSLATE function. The implementation of the CONVERT function in SQL Server is not compliant with the SQL 2003 standard. This function in SQL Server is equivalent to the CAST function.

CONVERT (data_type[(length) | (precision, scale)], expression, style])

The style clause is used to define the date conversion format. For more information, see the SQL Server documentation. Below is an example.

SELECT title, CONVERT(char(7), ytd_sales) FROM titles ORDER BY title GO