1. Aggregation কি?

Aggregation হলো OpenSearch-এর analytics এবং summary tool, যা ডাটাকে group, summarize, calculate করতে সাহায্য করে।

Metric Aggregation → সংখ্যাগত হিসাব (count, sum, avg, min, max)
Bucket Aggregation → document কে group বা categorize করা

Analogy:
SQL এ যেমন GROUP BY, SUM, AVG আছে, OpenSearch-এ একই কাজ Aggregation দিয়ে করা হয়।

2. Aggregation Types

### Metric Aggregation

Numeric বা quantitative data summarize করতে use হয়।

Metric Aggregation	ব্যাখ্যা	Example
avg	Numeric field এর average	Average price of books
sum	Numeric field এর sum	Total sales amount
min	Minimum value	Cheapest book price
max	Maximum value	Most expensive book price
stats	Count, min, max, sum, avg একসাথে	Book price statistics
extended_stats	stats + std deviation	Detailed numeric analysis

i) Average (avg)

👉 একটি numeric field এর average বের করে

{
  "aggs": {
    "avg_price": {
      "avg": { "field": "price" }
    }
  }
}

✅ Use Case:

Average salary
Average product price

ii) Sum

👉 সব values যোগ করে

{
  "aggs": {
    "total_price": {
      "sum": { "field": "price" }
    }
  }
}

✅ Use Case:

Total sales
Total revenue

iii) Min (Minimum)

👉 সবচেয়ে ছোট value

{
  "aggs": {
    "min_price": {
      "min": { "field": "price" }
    }
  }
}

iv) Max (Maximum)

👉 সবচেয়ে বড় value

{
  "aggs": {
    "max_price": {
      "max": { "field": "price" }
    }
  }
}

v) Stats

👉 একসাথে count, min, max, avg, sum দেয়

{
  "aggs": {
    "price_stats": {
      "stats": { "field": "price" }
    }
  }
}

✅ Output:

count
min
max
avg
sum

vi) Extended Stats

👉 Stats + extra analysis

{
  "aggs": {
    "extended_price_stats": {
      "extended_stats": { "field": "price" }
    }
  }
}

✅ Extra fields:

variance
std_deviation

✅ Use Case: Data variability analysis

vii) Value Count

👉 কতগুলো value আছে count করে

{
  "aggs": {
    "price_count": {
      "value_count": { "field": "price" }
    }
  }
}

⚠️ Note: এটি document count না, field value count

viii) Cardinality

👉 unique values count করে

{
  "aggs": {
    "unique_authors": {
      "cardinality": { "field": "author.keyword" }
    }
  }
}

✅ Use Case:

Unique users , Unique categories

⚠️ Approximate result দেয় (performance optimized)

ix) Percentiles

👉 data distribution বোঝার জন্য

{
  "aggs": {
    "price_percentiles": {
      "percentiles": {
        "field": "price",
        "percents": [25, 50, 75, 95]
      }
    }
  }
}

✅ Output:

25th percentile
median (50%)
75th percentile

x) Percentile Ranks

👉 নির্দিষ্ট value কোন percentile এ আছে

{
  "aggs": {
    "price_ranks": {
      "percentile_ranks": {
        "field": "price",
        "values": [100, 200]
      }
    }
  }
}

✅ Example: 100 price → 40% data এর নিচে

percentile_ranks এর মানে হলো:

"এই value গুলোর নিচে dataset এর কত % data আছে"

Percentile → percentage দিলে value বের করে
Percentile Rank → value দিলে percentage বের করে

xi) Top Hits

👉 bucket থেকে top documents নিয়ে আসে

{
  "aggs": {
    "top_books": {
      "top_hits": {
        "size": 2,
        "sort": [{ "price": { "order": "desc" } }]
      }
    }
  }
}

✅ Use Case:

Top selling products , Latest posts

xii) Geobounds

👉 geo data এর bounding box বের করে

{
  "aggs": {
    "geo_bounds": {
      "geo_bounds": {
        "field": "location"
      }
    }
  }
}

✅ Output:

top-left coordinate , right coordinate

xiii) Matrix Stats

👉 multiple fields এর statistics + correlation

{
  "aggs": {
    "matrix_stats": {
      "matrix_stats": {
        "fields": ["price", "quantity"]
      }
    }
  }
}

✅ Output:

covariance
correlation

✅ Use Case: Data relationship analysis

xiv) Scripted Metric

👉 custom logic লিখে aggregation করা

{
  "aggs": {
    "custom_metric": {
      "scripted_metric": {
        "init_script": "state.total = 0",
        "map_script": "state.total += doc.price.value",
        "combine_script": "return state.total",
        "reduce_script": "double sum = 0; for (s in states) { sum += s } return sum"
      }
    }
  }
}

✅ Use Case:

Custom calculation
Complex business logic

⚠️ Advanced + performance heavy

See More . . .

Summary Table

Aggregation	Purpose
avg	Average
sum	Total
min / max	Min / Max
stats	Basic stats
extended_stats	Advanced stats
value_count	Field count
cardinality	Unique count
percentiles	Distribution
percentile_ranks	Rank analysis
top_hits	Top documents
geo_bounds	Geo range
matrix_stats	Field relationship
scripted_metric	Custom logic

### Bucket Aggregation কি?

Bucket Aggregation হলো OpenSearch এর grouping mechanism, যা documents কে buckets এ divide করে।
প্রতিটি bucket হলো একটি group এবং এতে doc_count থাকে।
মূলত category-wise summary / analytics করতে bucket aggregation use হয়।

Analogy:

ধরো, তুমি bookstore এ কাজ করছো। তুমি চাইলে বইগুলোকে author-wise, price-range-wise, year-wise ভাগ করতে।
Bucket aggregation সেই কাজ করে।

Bucket Aggregation	ব্যাখ্যা	Example
terms	Field অনুযায়ী group	Author-wise book count
range	Numeric range অনুযায়ী group	Price 0-100, 100-200
histogram	Numeric interval অনুযায়ী group	Price per 50 units interval
date_histogram	Date interval অনুযায়ী group	Monthly sales
filter	Specific condition অনুযায়ী group	Books with price > 100
filters	Multiple conditions অনুযায়ী group	Books > 100 & < 200

A. Terms Aggregation

সবচেয়ে common bucket aggregation।
একটি field অনুযায়ী documents group করে।
Often used for categorical data (keyword, exact match fields)।

Example: Author-wise Book Count

GET /books/_search
{
  "size": 0,
  "aggs": {
    "books_by_author": {
      "terms": { "field": "author.keyword" }
    }
  }
}

Result:

{
  "buckets": [
    { "key": "Alice", "doc_count": 2 },
    { "key": "Bob", "doc_count": 2 },
    { "key": "Charlie", "doc_count": 1 }
  ]
}

B. Range Aggregation

Numeric fields কে range অনুযায়ী group করে।
Example: Price ranges, age ranges, salary ranges।

Example: Books by Price Range

GET /books/_search
{
  "size": 0,
  "aggs": {
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          { "to": 150 },
          { "from": 150, "to": 200 },
          { "from": 200 }
        ]
      }
    }
  }
}

Result:

{
  "buckets": [
    { "key": "*-150.0", "doc_count": 3 },
    { "key": "150.0-200.0", "doc_count": 1 },
    { "key": "200.0-*", "doc_count": 1 }
  ]
}

C. Histogram Aggregation

Numeric fields কে fixed interval অনুযায়ী group করে।
Example: Price per 50 units interval।

Example: Price Histogram (Interval=50)

GET /books/_search
{
  "size": 0,
  "aggs": {
    "price_histogram": {
      "histogram": {
        "field": "price",
        "interval": 50
      }
    }
  }
}

Result: Buckets like 100–149, 150–199, 200–249 etc.

NOTE : interval হলো আপনার বাকেটের ব্যবধাণ বা সাইজ।

ধুরুন, আপনার কাছে অনেকগুলো টাকার নোট আছে। আপনি যদি interval সেট করেন ৫০০০, তার মানে হলো আপনি প্রতি ৫০০০ টাকা পর পর একটি করে খাম (Bucket) বানাতে চাচ্ছেন।

এটি কীভাবে কাজ করে?

Histogram অ্যাগ্রিগেশন আপনার ডাটাকে নির্দিষ্ট গ্যাপে ভাগ করে দেয়। নিচের উদাহরণটি দেখুন:

যদি আপনি interval: 5000 দেন, তবে বাকেটগুলো হবে এমন:

বাকেট ১: ০ থেকে ৫০০০ (এর নিচে)
বাকেট ২: ৫০০০ থেকে ১০০০০ (এর নিচে)
বাকেট ৩: ১০০০০ থেকে ১৫০০০ (এর নিচে)

`range` এবং `histogram` এর মধ্যে পার্থক্য:

Range: এখানে আপনাকে প্রতিটি বাকেট হাত দিয়ে লিখে দিতে হয় (যেমন: ৫০০০-১০০০০, ১১০০০-১৫০০০)। এতে ভুল হওয়ার সম্ভাবনা থাকে বা মাঝখানে ডাটা বাদ পড়ে যেতে পারে (যেটি আপনার আগেরবার হয়েছিল)।
Histogram: এখানে শুধু ব্যবধাণ (interval) বলে দিলেই হয়। এটি অটোমেটিক ০ থেকে শুরু করে শেষ পর্যন্ত সমান ভাগে বাকেট তৈরি করে। এতে কোনো ডাটা মিস হওয়ার সুযোগ নেই।

D. Date Histogram Aggregation

Date/Time fields কে interval-wise group করে (day, month, year)।
Often used for time series analytics.

Example: Books by Publication Year

GET /books/_search
{
  "size": 0,
  "aggs": {
    "books_per_year": {
      "date_histogram": {
        "field": "year",
        "calendar_interval": "year"
      }
    }
  }
}

Result: Buckets with each year and doc_count

E. Filter Aggregation

শুধু নির্দিষ্ট condition অনুযায়ী one bucket তৈরি করে।

Example: Books with price > 150

GET /books/_search
{
  "size": 0,
  "aggs": {
    "expensive_books": {
      "filter": { "range": { "price": { "gt": 150 } } }
    }
  }
}

Result: doc_count = number of books with price > 150

F. Filters Aggregation

একসাথে multiple conditions bucket করে।
Multiple filters একসাথে check করা যায়।

Example: Books Price <150 and >150

GET /books/_search
{
  "size": 0,
  "aggs": {
    "price_filters": {
      "filters": {
        "filters": {
          "cheap": { "range": { "price": { "lt": 150 } } },
          "expensive": { "range": { "price": { "gte": 150 } } }
        }
      }
    }
  }
}

Result: cheap=3, expensive=2

G. Nested Aggregation

যদি nested objects থাকে (array of objects) → nested aggregation ব্যবহার হয়।

Example: Each book has multiple reviews

POST /books/_doc/1
{
  "title": "Book A",
  "reviews": [
    { "user": "U1", "rating": 5 },
    { "user": "U2", "rating": 4 }
  ]
}

GET /books/_search
{
  "size": 0,
  "aggs": {
    "nested_reviews": {
      "nested": { "path": "reviews" },
      "aggs": {
        "avg_rating": { "avg": { "field": "reviews.rating" } }
      }
    }
  }
}

Result: average rating across all reviews

H. Geo Distance Aggregation

Location field অনুযায়ী distance buckets group করে।
Use-case: Geospatial analytics

Example: Users distance from city center

GET /users/_search
{
  "size": 0,
  "aggs": {
    "distance_buckets": {
      "geo_distance": {
        "field": "location",
        "origin": "40,-70",
        "ranges": [
          { "to": 100 },
          { "from": 100, "to": 200 },
          { "from": 200 }
        ]
      }
    }
  }
}

3. Nested Aggregation (Bucket + Metric)

Example: Author-wise Average Price

GET /books/_search
{
  "size": 0,
  "aggs": {
    "books_by_author": {
      "terms": { "field": "author.keyword" },
      "aggs": {
        "avg_price": { "avg": { "field": "price" } }
      }
    }
  }
}

এটি bucket aggregation + metric aggregation এর perfect example।

4. Tips & Best Practices

size:0 ব্যবহার করো যখন শুধু aggregation result চাই।
.keyword ব্যবহার করো terms aggregation এ।
Large dataset → beware of terms default size=10, increase if needed।
Nested bucket → multiple level analytics possible।
Filter & filters → performance friendly alternative to query.
Date histogram → visualize time series easily।

Header Ads

Aggregation

1. Aggregation কি?

2. Aggregation Types

### Metric Aggregation

i) Average (avg)

ii) Sum

iii) Min (Minimum)

iv) Max (Maximum)

v) Stats

vi) Extended Stats

vii) Value Count

viii) Cardinality

ix) Percentiles

xi) Top Hits

xii) Geobounds

xiii) Matrix Stats

xiv) Scripted Metric

See More . . .

Summary Table

### Bucket Aggregation কি?

A. Terms Aggregation

B. Range Aggregation

C. Histogram Aggregation

এটি কীভাবে কাজ করে?

`range` এবং `histogram` এর মধ্যে পার্থক্য:

D. Date Histogram Aggregation

E. Filter Aggregation

F. Filters Aggregation

G. Nested Aggregation

H. Geo Distance Aggregation

3. Nested Aggregation (Bucket + Metric)

4. Tips & Best Practices

Smile is the best medicine for any problem . So always keep smiling . 🙂🙂🙂

Happiness is not about getting all you want . It is about enjoying all you have . 🙂🙂🙂

Allah knows what is the best for you and when it's best for you to have it . 🙂🙂🙂

Header Ads

Aggregation

1. Aggregation কি?

2. Aggregation Types

### Metric Aggregation

i) Average (avg)

ii) Sum

iii) Min (Minimum)

iv) Max (Maximum)

v) Stats

vi) Extended Stats

vii) Value Count

viii) Cardinality

ix) Percentiles

xi) Top Hits

xii) Geobounds

xiii) Matrix Stats

xiv) Scripted Metric

See More . . .

Summary Table

### Bucket Aggregation কি?

A. Terms Aggregation

B. Range Aggregation

C. Histogram Aggregation

এটি কীভাবে কাজ করে?

range এবং histogram এর মধ্যে পার্থক্য:

D. Date Histogram Aggregation

E. Filter Aggregation

F. Filters Aggregation

G. Nested Aggregation

H. Geo Distance Aggregation

3. Nested Aggregation (Bucket + Metric)

4. Tips & Best Practices

Smile is the best medicine for any problem . So always keep smiling . 🙂🙂🙂

Happiness is not about getting all you want . It is about enjoying all you have . 🙂🙂🙂

Allah knows what is the best for you and when it's best for you to have it . 🙂🙂🙂

`range` এবং `histogram` এর মধ্যে পার্থক্য: