문자열에서 모든 특수 문자 제거

itgroup 2023. 1. 21. 09:44

문자열에서 모든 특수 문자 제거

URL에 문제가 있습니다.무엇이든 포함할 수 있는 제목을 변환하고 특수 문자를 모두 삭제하여 문자와 숫자만 사용할 수 있도록 하고 싶습니다.물론 공백은 하이픈으로 바꾸고 싶습니다.

어떻게 해야 할까요?정규 표현(regex)이 사용되고 있다는 이야기는 많이 들었습니다.

이것은, 고객이 원하는 것을 실현합니다.

function clean($string) {
   $string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.

   return preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.
}

사용방법:

echo clean('a|"bc!@£de^&$f g');

유언 출력:abcdef-g

편집:

저기, 간단한 질문 하나 있는데, 어떻게 하면 여러 하이픈이 서로 옆에 있는 것을 막을 수 있을까요? 그리고 하이픈을 한 개로 바꿀 수 있을까요?

function clean($string) {
   $string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.
   $string = preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.

   return preg_replace('/-+/', '-', $string); // Replaces multiple hyphens with single one.
}

갱신하다

다음 솔루션에는 "SEO friendly" 버전이 있습니다.

function hyphenize($string) {
    $dict = array(
        "I'm"      => "I am",
        "thier"    => "their",
        // Add your own replacements here
    );
    return strtolower(
        preg_replace(
          array( '#[\\s-]+#', '#[^A-Za-z0-9. -]+#' ),
          array( '-', '' ),
          // the full cleanString() can be downloaded from http://www.unexpectedit.com/php/php-clean-string-of-utf8-chars-convert-to-similar-ascii-char
          cleanString(
              str_replace( // preg_replace can be used to support more complicated replacements
                  array_keys($dict),
                  array_values($dict),
                  urldecode($string)
              )
          )
        )
    );
}

function cleanString($text) {
    $utf8 = array(
        '/[áàâãªä]/u'   =>   'a',
        '/[ÁÀÂÃÄ]/u'    =>   'A',
        '/[ÍÌÎÏ]/u'     =>   'I',
        '/[íìîï]/u'     =>   'i',
        '/[éèêë]/u'     =>   'e',
        '/[ÉÈÊË]/u'     =>   'E',
        '/[óòôõºö]/u'   =>   'o',
        '/[ÓÒÔÕÖ]/u'    =>   'O',
        '/[úùûü]/u'     =>   'u',
        '/[ÚÙÛÜ]/u'     =>   'U',
        '/ç/'           =>   'c',
        '/Ç/'           =>   'C',
        '/ñ/'           =>   'n',
        '/Ñ/'           =>   'N',
        '/–/'           =>   '-', // UTF-8 hyphen to "normal" hyphen
        '/[’‘‹›‚]/u'    =>   ' ', // Literally a single quote
        '/[“”«»„]/u'    =>   ' ', // Double quote
        '/ /'           =>   ' ', // nonbreaking space (equiv. to 0x160)
    );
    return preg_replace(array_keys($utf8), array_values($utf8), $text);
}

위의 기능의 근거는 (저는 비효율적이라고 생각합니다.아래가 더 좋습니다) 이름이 붙지 않는 서비스가 명백히 URL에서 철자 검사와 키워드 인식을 실행했기 때문입니다.

고객의 편집증 때문에 오랜 시간을 허비하고 난 후, 나는 그들이 결국 상상이 아니라는 것을 알게 되었다. 그들의 SEO 전문가[나는 절대 아니다]는 보고는 "Viaggi Economy Per"를 다음과 같이 변환했다.viaggi-economy-peru보다 '더 나은'viaggi-economy-per(이전 "청소"는 UTF8 캐릭터를 삭제했다.보고타는 보고트가 되고 메델린은 메델린이 되었다.)

또한 결과에 영향을 미치는 것으로 보이는 몇 가지 오타가 있었습니다.제가 이해한 유일한 설명은 URL이 풀리고 단어를 골라내고 어떤 순위 알고리즘을 사용하는지 아무도 알 수 없다는 것입니다.그리고 이 알고리즘들은 UTF8이 청소한 문자열로 입력되었기 때문에 Perù는 Per가 아니라 Peru가 되었습니다.Per는 일치하지 않고 약간 목덜미를 잡았습니다.

UTF8 문자를 유지하고 일부 오타를 대체하기 위해 아래 함수가 더 정확(?)해졌습니다. $dict물론 손으로 직접 만들어야죠.

이전 답변

간단한 접근법:

// Remove all characters except A-Z, a-z, 0-9, dots, hyphens and spaces
// Note that the hyphen must go last not to be confused with a range (A-Z)
// and the dot, NOT being special (I know. My life was a lie), is NOT escaped

$str = preg_replace('/[^A-Za-z0-9. -]/', '', $str);

// Replace sequences of spaces with hyphen
$str = preg_replace('/  */', '-', $str);

// The above means "a space, followed by a space repeated zero or more times"
// (should be equivalent to / +/)

// You may also want to try this alternative:
$str = preg_replace('/\\s+/', '-', $str);

// where \s+ means "zero or more whitespaces" (a space is not necessarily the
// same as a whitespace) just to be sure and include everything

이 경우 먼저 다음 작업을 수행해야 할 필요가 있는 것에 주의해 주세요.urldecode()URL은 %20과 + 둘 다 실제로는 공백입니다.즉, "Never%20gonna%20give%20you%20up"이 있으면 "Never20gonna20give20you20up"이 아니라 "Never-gonna-give-you-up"이 됩니다.필요없을 수도 있지만, 가능성을 언급하고 싶었어요.

완성된 기능은 테스트 케이스와 함께 다음과 같습니다.

function hyphenize($string) {
    return 
    ## strtolower(
          preg_replace(
            array('#[\\s-]+#', '#[^A-Za-z0-9. -]+#'),
            array('-', ''),
        ##     cleanString(
              urldecode($string)
        ##     )
        )
    ## )
    ;
}

print implode("\n", array_map(
    function($s) {
            return $s . ' becomes ' . hyphenize($s);
    },
    array(
    'Never%20gonna%20give%20you%20up',
    "I'm not the man I was",
    "'Légeresse', dit sa majesté",
    )));


Never%20gonna%20give%20you%20up    becomes  never-gonna-give-you-up
I'm not the man I was              becomes  im-not-the-man-I-was
'Légeresse', dit sa majesté        becomes  legeresse-dit-sa-majeste

을 처리하기 UTF-8을 했습니다.cleanStringUTF8 문자를 일반 문자로 변환하는 구현(링크는 그 이후 끊어졌지만 너무 난해하지 않은 UTF8 문자가 포함된 제거된 복사본이 답변의 첫머리에 있습니다.필요에 따라 UTF8 문자를 일반 문자로 변환하는 것도 간단합니다)을 통해 "look"이라는 단어를 최대한 보존할 수 있습니다.이 기능을 심플화해, 퍼포먼스를 높이기 위해서 기능내에서 포장할 수 있습니다.

위의 함수도 소문자로 변환하는 기능을 구현하고 있지만, 그것은 취향입니다.이를 위한 코드는 코멘트 아웃되어 있습니다.

여기에서는 다음 기능을 확인합니다.

function seo_friendly_url($string){
    $string = str_replace(array('[\', \']'), '', $string);
    $string = preg_replace('/\[.*\]/U', '', $string);
    $string = preg_replace('/&(amp;)?#?[a-z0-9]+;/i', '-', $string);
    $string = htmlentities($string, ENT_COMPAT, 'utf-8');
    $string = preg_replace('/&([a-z])(acute|uml|circ|grave|ring|cedil|slash|tilde|caron|lig|quot|rsquo);/i', '\\1', $string );
    $string = preg_replace(array('/[^a-z0-9]/i', '/[-]+/') , '-', $string);
    return strtolower(trim($string, '-'));
}

언급URL : https://stackoverflow.com/questions/14114411/remove-all-special-characters-from-a-string

'IT' 카테고리의 다른 글

치명적 오류: 정의되지 않은 함수 mysql_connect() 호출 (0)	2023.01.21
MySQL은 두 값 사이에서 랜덤 값을 가져옵니다. (0)	2023.01.21
Maria에서 JSON_QUERY를 사용하여 개체 JSON 어레이를 가져오는 중DB (0)	2023.01.21
PHP가 특정 문자열 앞에 있는 모든 문자를 제거합니다. (0)	2023.01.21
JavaScript에서 이름(알파벳 순으로 배열 정렬) (0)	2023.01.21

현재글문자열에서 모든 특수 문자 제거

각종 프로그래밍 정보를 다루는 블로그입니다.

JSON, oracle, MySQL, reactjs, WordPress, jQuery, MongoDB, C, Python, JavaScript, spring-boot, Ajax, Java, powershell, Excel, MariaDB, AngularJS, sql-server, php, git,

Today :
Yesterday :

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

itgroup

문자열에서 모든 특수 문자 제거

문자열에서 모든 특수 문자 제거

갱신하다

이전 답변

'IT' 카테고리의 다른 글

'IT'의 다른글

티스토리툴바

문자열에서 모든 특수 문자 제거

문자열에서 모든 특수 문자 제거

갱신하다

이전 답변

'IT' 카테고리의 다른 글

'IT'의 다른글

관련글

티스토리툴바